Understanding Technology behind Generative Pre-Trained Transformer

pexels google deepmind 18069697 65586740979fb

Context: India will explore building large language models says Principal Scientific Advisor of India.

Natural Language Processing (NLP): 

  • NLP deals with giving computers the ability to understand text and spoken words in much the same way human beings can.
  • NLP combines human language with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
  • NLP drives computer programs that translate text from one language to another, respond to spoken commands, and summarize large volumes of text rapidly even in real-time. For example, voice-operated GPS systems, digital assistants, speech-to-text dictation software, customer service chatbots etc.

Understanding Large Language Model: 

  • Humans perceive the text as a collection of words. Sentences are sequences of words. Documents are sequences of chapters, sections, and paragraphs. But computers process one word or character at a time and provide an output once the entire input text has been consumed. 
  • LLM model works, but sometimes, it forgets what happened at the beginning of the sequence when the end is reached. So, computer scientists have found a transformer model to provide a better approach.

Transformer Model: 

 Process of Tokenisation 

  • To process a text input in a transformer model, the computer first tokenises the text input into a sequence of words.
  • These tokens are then encoded as numbers and converted into embeddings, which are vector-space representations of the tokens that preserve their meaning. 
  • Next, the encoder in the transformer transforms the embeddings of all the tokens into a context vector, which is like the essence of the entire text input. Using this vector, the transformer decoder generates output based on clues. 
  • Then, you can reuse the same decoder, but this time the clue will be the previously produced next word.
  • This process can be repeated to create an entire paragraph, starting from a leading sentence. This process is known as auto-regression.

In this model, the grammar of the output may not be correct as in reality, the transformer model doesn’t explicitly store grammar rules, instead, it acquires them implicitly through examples.

Large Language Model (LLM):

  • A large language model is a transformer model on a mass scale. 
  • It is so large that it usually cannot be run on a single computer. 
  • Such a large model is learned from a vast amount of text before it can remember the patterns and structures of language.
  • Because of this reason, it is naturally a service provided over API or a web interface. 

For example, the GPT-3 model which backs the ChatGPT service was trained on massive amounts of data from the internet including books, articles, websites etc. In this training it learned the statistical relationships between words, phrases, and sentences, allowing it to generate contextually relevant responses.

Other notable examples of this technology are Google’s PaLM used in Bard, and Meta’s LLaMa, as well as BLOOM, Ernie 3.0 Titan, and Anthropic’s Claude 2.

LLM Use Cases:

  • Linguistic diversity and inclusion: India is a linguistically diverse country, with over 22 official languages and hundreds of dialects spoken. LLM can enable people to access information and services in their own language, reducing the digital divide and fostering greater inclusivity.
  • Economic growth and innovation: It can transform industries in India such as healthcare, education, and manufacturing. For example, LLMs can be used to develop new educational tools, improve consultation, and automate customer service tasks.
  • Personalisation and training: It can generate training data, modules etc. and develop educational and similar tools that are personalised to each person’s needs making education and delivery of services effective.
  • Use in Research and Development (R&D): LLMs can be used to analyse large datasets of scientific data to identify patterns and trends that would be difficult or impossible for humans to find on their own. E.g., IIT-D is using LLM to find Malaria drugs.
  • Use in Intelligence: By analysing large social media datasets, satellite imagery, financial transaction data and human intelligence data and finding patterns through LLM intelligence agencies can identify potential threats, extremist propaganda etc. E.g., the US National Security Agency is using LLMs to analyse large amounts of social media data to identify potential terrorist threats. Hence, LLMs can be used to develop new tools for disseminating Early warnings, cybersecurity, social media monitoring etc. 
Source: The Hindu

Leave a Reply

Your email address will not be published. Required fields are marked *

The maximum upload file size: 20 MB. You can upload: image, document, archive, other. Drop files here

Free UPSC MasterClass
This is default text for notification bar