Jump to content

Large language model

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Falsifian (talk | contribs) at 19:11, 9 March 2023 (→‎Properties: fix grammar). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A large language model (LLM) is a general purpose language model consisting of a neural network with many parameters (i.e. billions of weights or more). LLMs trained on large quantities of unlabelled text perform well at a wide variety of tasks, a development which, since their emergence around 2018, has shifted the focus of natural language processing research away from the previous paradigm of training specialized supervised models for specific tasks.[1]

Properties

Though the term large language model has no formal definition, it generally refers to deep learning models having a parameter count on the order of billions or more.[2] LLMs are general purpose models which excel at a wide range of tasks, as opposed to being trained for one specific task (such as sentiment analysis, named entity recognition, or mathematical reasoning).[1][3]

Between 2018 and 2020, the standard method for harnessing an LLM for a specific NLP task was to fine tune the model with additional task-specific training. It has subsequently been found that more powerful LLMs such as GPT-3 can solve tasks without parameter updates using the technique of "few-shot prompting", in which the model is given a text prompt with a small number of examples of a particular task and must complete the unsolved task at the end of the prompt.[1] For example, a sentiment analysis task of labelling the sentiment of a movie review could be prompted as follows:[3]

Review: This movie stinks.
Sentiment: negative

Review: This movie is fantastic!
Sentiment:

If the model outputs "positive", then it has correctly solved the task. LLMs may also perform well at "zero-shot" prompts, in which they must solve a novel task presented in a text prompt without any preceding examples.[4]

Architecture

Since 2018, large language models have generally used the transformer architecture (whereas, previously, recurrent architectures such as the LSTM were most common).[1]

LLMs are computationally expensive to train. A 2020 study estimated the cost of training a 1.5 billion parameter model (1-2 orders of magnitude smaller than the state of the art at the time) at $1.6 million.[4]

A 2020 analysis found that neural language models' capability (as measured by training loss) increased smoothly in a power law relationship with number of parameters, quantity of training data, and computation used for training.[5][6] These relationships were tested over a wide range of values (up to seven orders of magnitude) and no attenuation of the relationship was observed at the highest end of the range (including for network sizes up to trillions of parameters).[6]

List of large language models

List of large language models
Name Year Developer Number of parameters[a] Notes
BERT 2018 Google 340 million[7]
GPT-2 2019 OpenAI 1.5 billion[8]
GPT-3 2020 OpenAI 175 billion[4] A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called ChatGPT in 2022.[9]
GLaM (Generalist Language Model) 2021 Google 1.2 trillion[10]
Megatron-Turing NLG 2022 Microsoft and Nvidia 530 billion[11]
LaMDA (Language Models for Dialog Applications) 2022 Google 137 billion[12]
PaLM (Pathways Language Model) 2022 Google 540 billion[13]
Chinchilla 2022 DeepMind 70 billion[14]
BLOOM 2022 Various 175 billion[5] Developed by a team of around 1,000 researchers with funding from the French government and the US company Hugging Face.[5]
LLaMA (Large Language Model Meta AI) 2023 Meta 65 billion[15]

See also

Notes

  1. ^ In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.

References

  1. ^ a b c d Manning, Christopher D. (2022). "Human Language Understanding & Reasoning". Daedalus.
  2. ^ Carlini, Nicholas; Tramer, Florian; Wallace, Eric; Jagielski, Matthew; Herbert-Voss, Ariel; Lee, Katherine; Roberts, Adam; Brown, Tom B; Song, Dawn; Erlingsson, Ulfar (2021). Extracting Training Data from Large Language Models (PDF). USENIX Security Symposium. Vol. 6.
  3. ^ a b Wei, Jason. "Emergent Abilities of Large Language Models".
  4. ^ a b c Wiggers, Kyle (28 April 2022). "The emerging types of language models and why they matter". TechCrunch.
  5. ^ a b c Ananthaswamy, Anil (8 March 2023). "In AI, is bigger always better?". Nature.
  6. ^ a b Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). "Scaling Laws for Neural Language Models". CoRR. abs/2001.08361. arXiv:2001.08361.
  7. ^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2 [cs.CL].
  8. ^ "GPT-2: 1.5B Release". OpenAI. 2019-11-05. Archived from the original on 2019-11-14. Retrieved 2019-11-14.
  9. ^ "ChatGPT: Optimizing Language Models for Dialogue". OpenAI. 2022-11-30. Retrieved 2023-01-13.
  10. ^ Dai, Andrew M; Du, Nan (December 9, 2021). "More Efficient In-Context Learning with GLaM". ai.googleblog.com. Retrieved 2023-03-09.
  11. ^ Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990.
  12. ^ Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022). "LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything". ai.googleblog.com. Retrieved 2023-03-09.
  13. ^ Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022). "Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance". ai.googleblog.com. Retrieved 2023-03-09.
  14. ^ Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022). "An empirical analysis of compute-optimal large language model training". Deepmind Blog.
  15. ^ "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. 24 February 2023.