Intelligent CIO North America Issue 59

EDITOR’ S QUESTION

questions, templates for writing, and summarizations of text can be created by generating content through sequences of tokens, one sequence at a time( the‘ sequence’ here can be interpreted as a sequence of tokens ranging from 1 to N tokens and a token is typically a fragment of one or more words).

Historical progress in NLP evolved from structural to symbolic, to statistical, to( neural network) pre-trained language models( PLMs) and lastly to LLMs – lately we have seen techniques for the distillation of LLMs and the generation of small language models but I don’ t want to digress.

Language modeling before the era of deep learning focused on training task-specific models through supervision whereas PLMs are trained through selfsupervision with the aim of learning representations that are common across different NLP tasks. As the size of PLMs increased, so did their performance on tasks. That led to LLMs that significantly increased the number of their model parameters and the size of their training dataset.

GPT-3 was the first model to achieve, purely via text interaction with the model,“ strong performance on many NLP datasets, including translation, questionanswering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.”

Today’ s LLMs accurately respond to task queries when prompted with task descriptions and examples. However, pre-trained LLMs fail to follow user intent and perform worse in zero-shot settings than in fewshot. Fine-tuning is known to enhance generalization to unseen tasks, improving zero-shot performance significantly. Other improvements relate to either taskspecific training or better prompting.

The abilities of LLMs to solve diverse tasks with humanlevel performance come at the cost of slow training and inference, extensive hardware requirements and higher running costs. Such constraints are hard to accept and that led to better architectures and training strategies. Parameter efficient tuning, pruning, quantization,

34 INTELLIGENTCIO NORTH AMERICA www. intelligentcio. com

Intelligent CIO North America Issue 59 | Page 34