Generative pre-trained transformer
Generative pre-trained transformer (GPT) is a family of language models generally trained on a large corpus of text data to generate human-like text. They are built using several blocks of the transformer architecture. They can be fine-tuned for various natural language processing tasks such as text generation, language translation, and text classification. The "pre-training" in its name refers to the initial training process on a large text corpus where the model learns to predict the next word in a passage, which provides a solid foundation for the model to perform well on downstream tasks with limited amounts of task-specific data.
Uses
- ChatGPT (Chat Generative Pre-trained Transformer)[1] is a chatbot launched by OpenAI in November 2022. It uses GPT-3.5, and is fine-tuned (an approach to transfer learning)[2] with both supervised and reinforcement learning techniques.
- BioGPT is a GPT that focuses on answering biomedical questions.[3] It is developed by Microsoft.[4]
- ProtGPT2 is a GPT that focuses on protein design.[5]
History
On June 11, 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced the Generative Pre-trained Transformer (GPT).[6] At this point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use on datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;[6][7] many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.[7] In contrast, GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.[6]
Architecture | Parameter count | Training data | |
---|---|---|---|
GPT-1 | 12-level, 12-headed Transformer encoder (no decoder), followed by linear-softmax. | 0.12 billion | BookCorpus:[8] 4.5 GB of text, from 7000 unpublished books of various genres. |
GPT-2 | GPT-1, but with modified normalization | 1.5 billion | WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit. |
GPT-3 | GPT-2, but with modification to allow larger scaling. | 175 billion | 570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2). |
References
- ^ Roose, Kevin (5 December 2022). "The Brilliance and Weirdness of ChatGPT". The New York Times. Archived from the original on January 18, 2023. Retrieved 26 December 2022.
Like those tools, ChatGPT — which stands for "generative pre-trained transformer" — landed with a splash.
- ^ Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 9781544361376. Archived from the original on January 10, 2023. Retrieved 10 January 2023.
{{cite book}}
: CS1 maint: location missing publisher (link) - ^ Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. (2022). "BioGPT: generative pre-trained transformer for biomedical text generation and mining". Brief Bioinform. 23 (6). doi:10.1093/bib/bbac409. PMID 36156661.
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^ Matthias Bastian (2023-01-29). "BioGPT is a Microsoft language model trained for biomedical tasks". The Decoder.
- ^ Ferruz, N., Schmidt, S. & Höcker, B.; et al. (2022). "ProtGPT2 is a deep unsupervised language model for protein design". Nature Communications volume. 13. doi:10.1038/s41467-022-32007-7.
{{cite journal}}
: CS1 maint: multiple names: authors list (link) - ^ a b c Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
- ^ a b Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.
- ^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books": 19–27.
{{cite journal}}
: Cite journal requires|journal=
(help)