Generative pre-trained transformer

Generative pre-trained transformer (GPT) is a family of language models generally trained on a large corpus of text data to generate human-like text. They are built using several blocks of the transformer architecture. They can be fine-tuned for various natural language processing tasks such as text generation, language translation, and text classification. The "pre-training" in its name refers to the initial training process on a large text corpus where the model learns to predict the next word in a passage, which provides a solid foundation for the model to perform well on downstream tasks with limited amounts of task-specific data.

Uses

ChatGPT (Chat Generative Pre-trained Transformer)^[1] is a chatbot launched by OpenAI in November 2022. It uses GPT-3.5, and is fine-tuned (an approach to transfer learning)^[2] with both supervised and reinforcement learning techniques.
BioGPT is a GPT that focuses on answering biomedical questions.^[3] It is developed by Microsoft.^[4]
ProtGPT2 is a GPT that focuses on protein design.^[5]

History

On June 11, 2018, OpenAI released a paper entitled "Improving Language Understanding by Generative Pre-Training", in which they introduced the Generative Pre-trained Transformer (GPT).^[6] At this point, the best-performing neural NLP models primarily employed supervised learning from large amounts of manually labeled data. This reliance on supervised learning limited their use on datasets that were not well-annotated, in addition to making it prohibitively expensive and time-consuming to train extremely large models;^[6]^[7] many languages (such as Swahili or Haitian Creole) are difficult to translate and interpret using such models due to a lack of available text for corpus-building.^[7] In contrast, GPT's "semi-supervised" approach involved two stages: an unsupervised generative "pre-training" stage in which a language modeling objective was used to set initial parameters, and a supervised discriminative "fine-tuning" stage in which these parameters were adapted to a target task.^[6]

GPT versions
	Architecture	Parameter count	Training data
GPT-1	12-level, 12-headed Transformer encoder (no decoder), followed by linear-softmax.	0.12 billion	BookCorpus:^[8] 4.5 GB of text, from 7000 unpublished books of various genres.
GPT-2	GPT-1, but with modified normalization	1.5 billion	WebText: 40 GB of text, 8 million documents, from 45 million webpages upvoted on Reddit.
GPT-3	GPT-2, but with modification to allow larger scaling.	175 billion	570 GB plaintext, 0.4 trillion tokens. Mostly CommonCrawl, WebText, English Wikipedia, and two books corpora (Books1 and Books2).

References

^ Roose, Kevin (5 December 2022). "The Brilliance and Weirdness of ChatGPT". The New York Times. Archived from the original on January 18, 2023. Retrieved 26 December 2022. Like those tools, ChatGPT — which stands for "generative pre-trained transformer" — landed with a splash.
^ Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 9781544361376. Archived from the original on January 10, 2023. Retrieved 10 January 2023.{{cite book}}: CS1 maint: location missing publisher (link)
^ Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. (2022). "BioGPT: generative pre-trained transformer for biomedical text generation and mining". Brief Bioinform. 23 (6). doi:10.1093/bib/bbac409. PMID 36156661.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ Matthias Bastian (2023-01-29). "BioGPT is a Microsoft language model trained for biomedical tasks". The Decoder.
^ Ferruz, N., Schmidt, S. & Höcker, B.; et al. (2022). "ProtGPT2 is a deep unsupervised language model for protein design". Nature Communications volume. 13. doi:10.1038/s41467-022-32007-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)
^ ^a ^b ^c Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.
^ ^a ^b Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.
^ Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books": 19–27. {{cite journal}}: Cite journal requires |journal= (help)

[:4-1] Roose, Kevin (5 December 2022). "The Brilliance and Weirdness of ChatGPT". The New York Times. Archived from the original on January 18, 2023. Retrieved 26 December 2022. Like those tools, ChatGPT — which stands for "generative pre-trained transformer" — landed with a splash.

[2] Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 9781544361376. Archived from the original on January 10, 2023. Retrieved 10 January 2023.{{cite book}}: CS1 maint: location missing publisher (link)

[pmid36156661-3] Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H; et al. (2022). "BioGPT: generative pre-trained transformer for biomedical text generation and mining". Brief Bioinform. 23 (6). doi:10.1093/bib/bbac409. PMID 36156661.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[4] Matthias Bastian (2023-01-29). "BioGPT is a Microsoft language model trained for biomedical tasks". The Decoder.

[5] Ferruz, N., Schmidt, S. & Höcker, B.; et al. (2022). "ProtGPT2 is a deep unsupervised language model for protein design". Nature Communications volume. 13. doi:10.1038/s41467-022-32007-7.{{cite journal}}: CS1 maint: multiple names: authors list (link)

[gpt1paper-6] Radford, Alec; Narasimhan, Karthik; Salimans, Tim; Sutskever, Ilya (11 June 2018). "Improving Language Understanding by Generative Pre-Training" (PDF). OpenAI. p. 12. Archived (PDF) from the original on 26 January 2021. Retrieved 23 January 2021.

[tsvetkov-7] Tsvetkov, Yulia (22 June 2017). "Opportunities and Challenges in Working with Low-Resource Languages" (PDF). Carnegie Mellon University. Archived (PDF) from the original on 31 March 2020. Retrieved 23 January 2021.

[8] Zhu, Yukun; Kiros, Ryan; Zemel, Rich; Salakhutdinov, Ruslan; Urtasun, Raquel; Torralba, Antonio; Fidler, Sanja (2015). "Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books": 19–27. {{cite journal}}: Cite journal requires |journal= (help)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]