Natural Language Processing

Abhishek k
8 min readJun 9, 2022



Throughout the years, programming has been connected as the language being used by human beings to communicate with computer systems. Different types of programming languages have been developed, focusing on different functionalities, field implementations, and learning complexity.

A noticeable example regarding complexity can be seen in an analysis performed by the experts of Springboard, related to which languages are hardest and easiest to learn respectively [1]. According to the article, one of the easiest languages to learn is Python, since it utilizes English language words, and provides a less intimidating programming environment for the user. On the other hand, a highly complex language is C++, with complicated syntax, and easier to learn by users that already have a foundational programming background.

Due to the technological advancements and programming improvements, a state-of-the-art topic is the utility of a programming language, using natural written or vocal commands for high-level development. Such technology is called Natural Language Processing (NLP). NLP’s core component is Artificial Intelligence (A.I.), enabling computer systems to understand the human written or vocal language of programmers, which is linked to the part of Natural Language [2].

In the corresponding article, we are going to describe and explain the concept and technology of NLP, as well as its advancements and functional properties.

The Technology Of NLP

NLP follows two processes to convert human language into programming code: Natural Language Understanding and Natural Language Generation [3].

During Natural Language Understanding, the computer attempts to understand the meaning of the text given by the user, through the nature and structure of the text. Such functionality is achieved by using lexicons, grammatical rules, and different types of ambiguities. The next step is Natural Language Generation. The computer attempts to create readable texts from structured data, utilizing text and sentence planning, and realization.

To enhance the understanding of NLP technology, we provide the following main algorithms used [4].

  • Bag of Words: It is a model used for word counting, by utilizing an occurrence matrix for the targeted piece of text. The occurred word frequencies are used as training features for NLP classifiers.
  • Tokenization: It is a process used to simplify slicing a piece of text, provided by the user, into sentences and words, called tokens. As part of this simplification, the computer can remove punctuation for a better understanding of the wording.
  • Stop Words Removal: It’s an algorithm used by NLP to remove prepositions with little to no value in the piece of text provided. One of the benefits of the particular algorithm is that the user can preselect the prepositions that would like to be excluded during NLP.
  • Stemming: It is capable of removing affixes from the piece of the corresponding text. However, it can cause difficulties due to the unintended creation of new words. Excluding the particular drawback, stemming is an effective algorithm for fast spelling checking of the tokens.
  • Lemmatization: Its main functionality is converting the tokens into their basic form, to be grouped with other similar wording. A secondary feature is providing disambiguation solutions.
  • Topic Modeling: It can be considered as the most advanced NLP algorithm since it’s capable of analyzing the structure of texts for relevant topics and their respective content.

Transformers and their variants


In 2017 an innovative architecture model was introduced, called “Attention is all you need” [5], generating the concept of Transformers for NLP, as well as the general concept of sequence-to-sequence architectures.

Transformers’ functional goals are the solution of such architecture types of tasks while performing long-range dependencies. It consists of two main parts, the Encoder, and Decoder, with each part being consisted of sub-components supporting the core infrastructure. Taking into consideration [6] and [7], we present the sub-components of the Encoders and Decoders.


  • The multi-head self-attention mechanism is dedicated to creating connections of a similar context within the common piece of text.
  • Feed-forward network, dedicated to passing the data to the decoder


  • The multi-head self-attention mechanism provides similar functionality to the encoder.
  • Multi-head attention mechanism, dedicated to processing the input to generate an output sequence.
  • Feed-forward network, providing similar functionality with the encoder, except for passing the data as output.
Figure 1: The originally proposed architecture of Transformers models

As a result, the Transformer is utilizing Deep Learning techniques to recognize the relationship between sequences of words, in terms of context.


Bidirectional Encoder Representations from Transformers, also known as BERT, is a new model introduced by Google AI Language, focused on NLP technologies and supported by Transformer architecture. It is considered an advancement compared to the typical approach of text sequence processing, which is based on bidirectional training with a technique called Masked LM [8].

One of the main differences between BERT and the traditional model of Transformer, or NLP in general, is that BERT utilizes only the Encoding part of the architecture, due to its design for the generation of a language model. It doesn’t require text-to-text or word-to-word processing, but it processes the whole word sequence simultaneously, due to its bidirectional properties and instant detection of context relation from its surrounding wording.

Those features consist of two approaches, the bidirectional training for predicting masked words, or else Masked LM as it is mentioned above, and highly optimized to outperform left-to-right word recognition. The second training approach is Next Sentence Prediction (NSP), optimized to detect context similarities between the wording.


Generative Pre-Trained Transformer (GPT), is another example of technological evolution regarding NLP. They are highly functional models for tasks such as questioning, answering, and text summarization. They require little to no directions or examples for task recognition, due to the high volume of pretrained data. [9]

There are three different types of GPT models, explained in the following list. [10]

· Generative Pre-Training for Language Understanding (GPT-1): The particular model utilizes unlabeled data, and proceeds by incorporating tasks such as classification and text summarization to optimize the preparation of the model. It is consisted of a 12-layer decoder, with a structure similar to Transformer. The pretraining was fulfilled by utilizing BookCorpus as its source of data.

· Generative Pre-Training for Unsupervised Multitasking Learning (GPT-2): It is an upgraded version of GPT-1, with two extra features incorporated. Task Conditioning is used for the production of different outputs, for different tasks, but with similar input. Zero-shot learning and transfer enable GPT-2 models to accept data without instructions and allow the model to provide task solutions according to the nature of the task. The dataset and layers are also upgraded, with a source called WebText, implementing 40GB of high upvoted Reddit posts, and 48 layers of architecture design.

· Generative Pre-Training for few shots learning models (GPT-3): Similarly, like its predecessor, GPT-3 is an upgraded version of GPT-2. Its upgraded features are In-Context learning and a setting for few-shot, one-shot, and zero-shot. In-Context Learning is focused on developing recognition patterns from the text sequences and data used for training, aiming at minimal loss of data. The specific feature is then used depending on the choice of the use for few-shot, one-shot, or zero-shot tasks. The architecture is similar to GPT-2, increasing the layers to 96. The datasets are improved versions and mixtures of GPT-1 and GPT-2.

NLP Applications

According to Tableau, a data-driven organisation supporting business into data transitions, the most important application combining A.I. and NLP technologies are the following [11].

· Email Filters: It is one of the basic NLP applications, originated from the known Spam filters by email providers. The filters detect a sequence or individual words, that signal the classification task and categorize the emails according to the content into primary, social, promotional, or spam.

· Smart Assistants: The most popular examples are Siri by Apple, and Alexa by Amazon. Smart Assistants utilize voice recognition to detect wording and patterns, to provide a useful response, or perform simple tasks in smart home infrastructures. According to Tableau, it is believed that smart assistants will soon be able to perform more complex tasks, such as performing human actions that require third-party involvement or advanced conversations.

· Predictive Text: A simple example of the particular NLP utility is browser search engines. The programmed search engine utilizes the wording typed by the user to provide relevant suggestions regarding the search itself before it is fully typed, or relevant results in general. Also, the functionality of autocorrecting or autocompleting is another example of predictive text, seen in search engines, smartphones, and tablets. One of its features is the Deep Learning algorithms used for the system to be adaptive and more predictive for the user.

· Data and Text Analytics: NLP is slowly being integrated into Data Analytics providing the user with easier tools regarding efficient data visualization into a simplified interface. People are starting to be capable of using natural language to select their desired and suitable visualization method for their data preferences. Text analytics is another field adopting the benefits of NLP, with the user being able to input unstructured wording and receive meaningful data by the NLP application. Such examples are businesses utilizing NLP to detect reviews, posts, or comments mentioning their brand.


In the particular article, we discussed the concept of NLP technology, with its basic functions. We proceeded by studying the Transformer models and their variants, such as BERT and GPT, as well as mentioning some of the major NLP application fields. It is safe to assume that NLP has the potential to be developed into one of the smartest technologies available to the public since there are constant improvements and upgrades to their techniques and algorithms.


[1] “The easiest and hardest languages to learn.” Accessed: Feb. 27, 2022. [Online]. Available:

[2] B. Lutkevich, “What is Natural Language Processing? An Introduction to NLP.” 2021, Accessed: Feb. 27, 2022. [Online]. Available:

[3] “Natural Language Processing Applications and Techniques.” (accessed Feb. 28, 2022).

[4] Diego Lopez Yse, “Your Guide to Natural Language Processing (NLP) | by Diego Lopez Yse | Towards Data Science,” toward data science, 2019. (accessed Feb. 28, 2022).

[5] A. Vaswani et al., “Attention Is All You Need.”

[6] “Transformers in NLP: A beginner friendly explanation | Towards Data Science.” (accessed Mar. 01, 2022).

[7] Maxime, “What is a Transformer?. An Introduction to Transformers and… | by Maxime | Inside Machine learning | Medium,”, 2019. (accessed Mar. 01, 2022).

[8] R. Horev, “BERT Explained: State of the art language model for NLP | by Rani Horev | Towards Data Science,” Towardsdatascience.Com, pp. 1–7, 2018, Accessed: Mar. 01, 2022. [Online]. Available:

[9] D. Ajayi, “How BERT and GPT models change the game for NLP — Watson Blog,” 2020. (accessed Mar. 01, 2022).

[10] P. Shree, “GPT models explained. Open AI’s GPT-1,GPT-2,GPT-3 | Walmart Global Tech Blog,” 2020. (accessed Mar. 01, 2022).

[11] “Natural Language Processing (NLP) Examples | Tableau.” (accessed Mar. 02, 2022).