The recent hype surrounding Generative Pre-trained Transformer 3 (GPT-3), the new artificial intelligence (AI) based natural language processing (NLP) model, is worth observing, particularly from the enterprise front. Both keen observation and casual look-see applied to this latest language model that generates human-like written content are worth your time and effort. It can also show you that the hype is real. However, like every technological innovation, GPT-3 has its shortcomings, yet it is a great leap for AI.
What is GPT-3?
In May 2020, OpenAI, an AI research lab founded by Elon Musk, launched the latest version of an AI-based Natural Language Processing system named GPT-3 that can mimic human language. This 175-billion parameter deep learning language model was trained on larger text datasets that contain hundreds of billions of words.
GPT-3, at its core, is a transformer model — a sequence-to-sequence deep learning model that can give out a sequence of structured text if an input sequence of text is provided. This machine learning (ML) model is designed for text generation functions such as question-answering, machine translation, and summarizing text.
Unlike Long Short-Term Memory (LSTM) neural networking models, Transformer models operate using multiple units named attention blocks focusing only on the relevant parts of a text sequence. LSTM, a complex area of deep learning, is required by domains like machine translation and speech recognition, to name a few.
The Evolution of GPT-3
OpenAI launched its first GPT Natural Language Processing model in 2018, followed by GPT-2 in 2019. GPT-3 is the third version of OpenAI’s GPT series. These three machine learning models are an example of the exponential pace of technological innovations in language models. Primarily, the credit goes to two greater technological advances that happened in 2015. The first one is called Attention, and the second, Unsupervised Learning.
Attention: Making AI Conscious
Yoshua Bengio, an A.M. Turing Award-winning AI scientist, and his team observed that the compression and decompression of English language sentences by NLP models cram every sentence into a fixed-length vector irrespective of the length of the sentence.
This rigid approach stood as an obstacle for Bengio and his team. To find the apt words that optimize the conditional probability, an NLP model should be able to search across many vectors of varying lengths. So, they devised a way to meet this challenge. And to let the neural networking models flexibly compress words into vectors of varying sizes and let it freely search across those vectors for relevant contexts. It is known as Attention.
Attention rapidly became a crucial element in natural language processing models. Two years later, Google used it to create Transformer, a language model program. In 2018, using Attention, Google created Bidirectional Encoder Representations from Transformers (BERT), another highly successful language model. The Transformer also became the foundation of the development of GPT-1.
Unsupervised Learning: Clustering Data Without Human Intervention
The freedom of the Transformer and its successors to scan different parts of the text provided and find conditional dependencies on a broader scale set the stage for another innovation in 2015. Named unsupervised learning, it turned into an even more crucial element in OpenAI’s GPT development.
During that period, most language models had been using supervised learning with labeled data. Labeled data consists of an input and an objective model of the desired output. The difference between the accuracy of the target output and desired output, also known as an objective function or loss function, determines the efficiency of the model. The smaller the difference, the higher the level of language model optimization. It also makes the neural networking model considered as trained.
Bringing out the desired output using labeled data requires higher levels of curation of data that includes human judgment, a time-consuming task that also needs more extensive resources. However, Quoc Le and Andrew Dai of Google found out that if an NLP model was initially trained in an unsupervised way, the amount of labeled data could be reduced.
GPT-3 in the Making
In 2018, OpenAI combined the attention mechanism and the unsupervised pre-training approach that allow the Neural Networking system to compress and decompress large amounts of text to reproduce the original text. As a part of this process, GPT-1 was trained to compress and decompress the contents of the BookCorpus, a 5GB database consisting of the text content of over 7,000 published books compiled by the University of Toronto and MIT. The size of the content was nearly a million words.
Speculating that the accuracy of the Machine Learning model can be improved by feeding more data, the OpenAI team widened the horizons of the model’s data ingestion. GPT-2 ingested the text content of an indigenous data set, consisting of eight million web pages totaling 40GB of data.
GPT-3 took the text ingestion, compression, and decompression to another level. It consumed the text content of the popular CommonCrawl dataset of web pages from 2016 to 2019. Although OpenAI curated it to eliminate redundancy and improve quality, it is nominally 45-terabytes worth of compressed text data.
GPT-3 and Enterprise Digital Transformation Initiatives
GPT-3 is the third generation GPT Natural Language Processing model created by OpenAI. It is the size that differentiates GPT-3 from its predecessors. The 175 billion parameters of GPT-3 make it 17 times as large as GPT-2. It also turns GPT-3 about ten times as large as Microsoft’s Turing NLG model. In short, GPT-3, with 96 attention blocks — each one carrying 96 attention heads — is basically a giant transformer model.
The popularity of GPT-3 is mainly because of its ability to perform a wide range of natural language tasks and produce human-like text. Here are a few tasks that GPT-3 can perform:
- Sentiment analysis
- Question answering
- Text generation
- Text summarization
- Named-entity recognition
- Language translation
We can think of GPT-3 as an ML model that can mimic human writing and reading comprehension better than humans as it has seen more text content than any human will ever read in their lifetime. But its performance and utility are not limited to that.
Enterprise Applications of GPT-3
GPT-3 has broader applications for enterprises. The most immediate use cases include improved customer service, along with responding to employee queries and the automation of tasks. When AI takes up legal tasks such as contract analysis and compliance enforcement, enabling employees to focus on other higher value-generating tasks.
GPT3 allows the users to translate natural language into Structured Query Language (SQL) queries. This capability helps create application layouts without technical knowledge, develop spreadsheets using complex Cascading Style Sheets (CSS), or even deploy Amazon Web Services (AWS) or Microsoft Azure instances.
GPT-3 can write simpler versions of complicated technical instructions without human intervention. For instance, a blogger has deployed GPT-3 to write blog posts that performed exceptionally well on Hacker News, a tech news aggregator.
GPT-3: A Powerful Digital Transformation Tool
GPT-3 is extremely powerful to digitally transform any enterprise by solving a wide range of issues in natural language processing. More evidence shows that the ingestion of more data, more computing power, and more time into the GPT-3 architecture gets you astonishing results.
Read next: How Quantum Computing Will Transform AI