Jump to content

User:AryamanA/List of large language models: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Line 27: Line 27:
| ERNIE 3.0
| ERNIE 3.0
| {{dts|2021-07-05}}
| {{dts|2021-07-05}}
| {{no}}
| {{yes}}
| <ref>{{cite arxiv|eprint=2107.02137|title=ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation|author=Yu Sun|year=2021|display-authors=etal}}</ref>
| <ref>{{cite arxiv|eprint=2107.02137|title=ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation|author=Yu Sun|year=2021|display-authors=etal}}</ref>
|-
|-

Revision as of 22:28, 16 March 2023

Encoder-decoder

Non-finetuned

Name Creator Parameters Languages Trained on Announced Access References
BART Facebook 0.4B English RoBERTa October 29, 2019 Yes [1]
ERNIE 3.0 Baidu 10B Chinese ERNIE 3.0 July 5, 2021 Yes [2]
ERNIE 3.0 Titan Baidu 260B Chinese ERNIE 3.0 December 23, 2021 No [3]
T5 Google 11B English C4 February 24, 2020 Yes [4][5]
LM-adapted T5 Google 11B English C4 (more training) April 18, 2021 Yes [6]
mBART Facebook 0.68B Multilingual25 CC25 January 22, 2020 Yes [7]
mT5 Google 13B Multilingual101 mC4 October 22, 2020 Yes [8]
UL2 Google 20B English C4 May 10, 2022 Yes [9][10]

Finetuned

Name Creator Parameters Languages Finetuning Announced Access References
Flan-T5 Google 11B English T5 (Instruction) October 20, 2022 Yes [11]
mT0 BigScience 13B Multilingual46 mT5 (Multitask) November 3, 2022 Yes [12]
T0, T0+, T0++ BigScience 11B English T5+LM (Multitask) October 15, 2021 Yes [13]
UL2 Google 20B English UL2 (Instruction) February 28, 2023 Yes [9]

Encoder-only

Non-finetuned

Name Creator Parameters Languages Trained on Announced Access References
ALBERT Google 17M English BERT September 26, 2019 Yes [14]
BERT Google 340M English BERT (BookCorpus, English Wikipedia) October 11, 2018 Yes [15][16]
mBERT Google 172M Multilingual104 Wikipedia November 4, 2018 Yes [17][18]
BERT-Base, Chinese Google 102M Chinese Chinese Wikipedia November 4, 2018 Yes [19][18]
BERTIN BERTIN Project 355M Spanish mC4 July 14, 2022 Yes [20]
DeBERTa Microsoft 1,500M English BERT + 3 corpora June 5, 2020 Yes [21]
ELECTRA-Large Google 334M English XLNet March 23, 2020 Yes [22]
IndicBERT AI4Bharat 33M Multilingual12 IndicCorp September 13, 2020 Yes [23]
IndicBERT v2 AI4Bharat 278M Multilingual24 IndicCorp v2 November 13, 2022 Yes [24]
MuRIL Google 236M Multilingual17 OSCAR, Wikipedia March 19, 2021 Yes [25]
RoBERTa Facebook, University of Washington 355M English BERT + 3 corpora July 26, 2019 Yes [26]
XLM-15 Facebook 250M Multilingual15 Wikipedia January 22, 2019 Yes [27][28]
XLM-17 Facebook 570M Multilingual17 Wikipedia August 17, 2019 Yes [28]
XLM-100 Facebook 570M Multilingual100 Wikipedia August 17, 2019 Yes [28]
XLM-R Facebook 550M Multilingual100 CommonCrawl November 5, 2019 Yes [29]
XLNet-Large Carnegie Mellon University, Google Brain 360M English XLNet (BERT + 3 corpora) June 19, 2019 Yes [30]

Decoder-only

Non-finetuned

Name Creator Parameters Languages Trained on Announced Access References
Anthropic-LM Anthropic 52B English December 1, 2021 No [31]
BLOOM BigScience 176B Multilingual46 + Code13 ROOTS July 6, 2022 Yes [32][33]
Chinchilla DeepMind 70B English MassiveText March 29, 2022 No [34]
CodeGeex Tsinghua University 13B Code20 The Pile, CodeParrot September 19, 2022 Requestable [35]
CodeGen Salesforce 16.1B Code GitHub March 25, 2022 Yes [36]
Codex OpenAI 12B Code GitHub July 7, 2021 API [37][38]
Cohere large Cohere 13.1B[39] English November 15, 2021 API [40]
Cohere xlarge Cohere 52.4B[39] English February 28, 2022 API [41]
CPM-1 Tsinghua University 2.6B Chinese December 1, 2020 Yes [42][43]
DialoGPT Microsoft 0.762B English Reddit November 1, 2019 Yes [44]
FairSeq Dense Meta 13B English RoBERTa + CC100 December 20, 2021 Yes [45][46]
FairSeq Sparse Meta 1,100BMoE English RoBERTa + CC100 December 20, 2021 Requestable [45][46]
Galactica Meta 120B English Scientific papers, etc. November 16, 2022 Yes [47]
GLaM Google 1,200BMoE English News, books, etc. December 13, 2021 No [48]
GLM-130B Tsinghua University 130B English + Chinese August 4, 2022 Yes [49][50]
Gopher DeepMind 280B English MassiveText December 8, 2021 No [51]
GPT-1 OpenAI 0.117B English BookCorpus June 11, 2018 Yes [52]
GPT-2 OpenAI 1.558B English WebText February 14, 2019 Yes [53]
GPT-3 OpenAI 175B English CommonCrawl, WebText2, etc. May 28, 2020 API [54]
GPT-4 OpenAI ? English ? March 14, 2023 Online [55][56]
GPT-Neo EleutherAI 2.7B English The Pile March 22, 2021 Yes [57]
GPT-NeoX EleutherAI 20B English The Pile April 14, 2022 Yes [58]
GPT-J EleutherAI 6B English The Pile June 4, 2021 Yes [59]
GPT-JT Together 6B English The Pile November 29, 2022 Yes [60]
GPT-SW3 AI Sweden 20B Multilingual5 The Nordic Pile[61] January 23, 2023 Requestable [62]
GPT-SW3 v1 AI Sweden 3.5B Swedish OSCAR, Web, etc. February 15, 2022 Yes [63]
Grover-Mega University of Washington 1.5B English RealNews May 29, 2019 Yes [64]
HyperCLOVA Naver 82B Korean September 10, 2021 No [65]
J1-Jumbo AI21 Labs 178B English August 12, 2021 API [66]
LaMDA (PT) Google 137B English May 18, 2021 No [67][68]
LLaMA Meta 65B English CommonCrawl, C4, etc. February 24, 2023 Requestable [69][70]
Luminous Supreme Aleph Alpha 70B Multilingual5 August 15, 2022 API [71]
Meena Google 2.6B English Social media January 27, 2020 No [72]
mGPT Sberbank 13B Multilingual60 Wikipedia, C4 April 15, 2022 Yes [73]
Mistral Stanford University 0.335B English OpenWebText August 26, 2021 Yes [74]
Megatron-Turing NLG Microsoft, NVIDIA 530B English CommonCrawl January 28, 2022 No [75]
OPT Meta 175B English RoBERTa, The Pile, PushShift.io Reddit May 3, 2022 Online, requestable [76]
PAGnol LightOn 1.5B French CCNet, OSCAR October 16, 2021 Online, API [77]
Pythia EleutherAI 12B English The Pile February 13, 2023 Yes [78]
PaLM Google 540B English Social media, filtered webpages, etc. April 5, 2022 No [79]
SantaCoder BigCode 1.1B Code3 The Stack January 9, 2023 Yes [80]
Turing-NLG Microsoft 17B English February 13, 2020 No [81]
Wu Dao 2.0 BAAI 1,750BMoE English + Chinese (multimodal) WuDaoCorpora May 31, 2021 No [82]
YaLM Yandex 100B Russian + English The Pile, Yandex pages, etc. June 23, 2022 Yes [83][84]
Yuan 1.0 Inspur 245B Chinese CommonCrawl, etc. October 10, 2021 No [85]

Finetuned

Name Creator Parameters Languages Finetuning Announced Access References
Alpaca Stanford University 7B English LLaMA (Instruction) March 13, 2023 Online, reproducible [86]
BlenderBot 3 Meta 175B English OPT (Dialogue) April 5, 2022 Online [87]
BLOOMZ BigScience 176B Multilingual46 + Code13 BLOOM (Multitask) November 3, 2022 Yes [12]
Cohere command Cohere 52.4B[39] English Cohere xlarge (Instruction) November 8, 2022[88] API [89]
FLAN Google 137B English LaMDA-PT (Instruction) September 3, 2021 No [90]
Flan-PaLM Google 540B English PaLM (Instruction) October 20, 2022 No [11]
GPT-NeoXT-Chat-Base-20B Together 20B English GPT-NeoX (Dialogue) March 10, 2023 Yes [91]
InstructGPT-3 (SFT) OpenAI 175B English GPT-3 (Instruction) March 4, 2022 API [92]
LaMDA Google 137B English, multilingual LaMDA-PT (Dialogue) May 18, 2021 No [93][94]
OPT-IML Meta 175B English OPT (Instruction) December 22, 2022 Requestable [95]

RLHF

These are models that were fine-tuned with reinforcement learning from human feedback (RLHF).

Name Creator Parameters Languages Trained on Announced Access References
Anthropic-LM v4-s3 Anthropic 52B[39] English April 12, 2022 Online, API [96]
InstructGPT-3 (PPO) OpenAI 175B English March 4, 2022 API [92]
ChatGLM-6B Tsinghua University 6.2B Chinese + English GLM March 13, 2023 Yes [97]
ChatGPT OpenAI 175B English, multilingual CommonCrawl, WebText2, etc. November 30, 2022 Online, API [98]

References

  1. ^ Mike Lewis; et al. (2019). "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". arXiv:1910.13461.
  2. ^ Yu Sun; et al. (2021). "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2107.02137.
  3. ^ Shuohuan Wang; et al. (2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731.
  4. ^ Adam Roberts; Collin Raffel (February 24, 2020). "Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer".
  5. ^ Colin Raffel; et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (PDF). Journal of Machine Learning Research. 21: 1–67.
  6. ^ Brian Lester; et al. (2021). "The Power of Scale for Parameter-Efficient Prompt Tuning". arXiv:2104.08691.
  7. ^ Yinhan Liu; et al. (2020). "Multilingual Denoising Pre-training for Neural Machine Translation". arXiv:2001.08210.
  8. ^ Linting Xue; et al. (2020). "mT5: A massively multilingual pre-trained text-to-text transformer". arXiv:2010.11934.
  9. ^ a b Yi Tay; et al. (2022). "UL2: Unifying Language Learning Paradigms". arXiv:2205.05131.
  10. ^ Yi Tay; Mostafa Dehghani (October 14, 2022). "UL2 20B: An Open Source Unified Language Learner". Google.
  11. ^ a b Hyung Won Chung; et al. (2022). "Scaling Instruction-Finetuned Language Models". arXiv:2210.11416.
  12. ^ a b Niklas Muennighoff; et al. (2022). "Crosslingual Generalization through Multitask Finetuning". arXiv:2211.01786.
  13. ^ Victor Sanh; et al. (2021). "Multitask Prompted Training Enables Zero-Shot Task Generalization". arXiv:2110.08207.
  14. ^ Zhenzhong Lan; et al. (2019). "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". arXiv:1909.11942.
  15. ^ Jacob Devlin; Ming-Wei Chang; Kenton Lee; Kristina Toutanova (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2.
  16. ^ Jacob Devlin; Ming-Wei Chang (November 2, 2018). "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google.
  17. ^ "bert-base-multilingual-cased". HuggingFace.
  18. ^ a b "bert/multilingual.md". google-research/bert.
  19. ^ "bert-base-chinese". HuggingFace.
  20. ^ Javier de la Rosa; et al. (2022). "BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling". arXiv:2207.06814.
  21. ^ Pengcheng He; et al. (2020). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". arXiv:2006.03654.
  22. ^ Kevin Clark; et al. (2020). "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". arXiv:2003.10555.
  23. ^ Divyanshu Kakwani; et al. (2020). IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. Findings of the Association for Computational Linguistics. Association for Computational Linguistics.
  24. ^ Sumanth Doddapaneni; et al. (2022). "IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages". arXiv:2212.05409.
  25. ^ Simran Khanuja; et al. (2021). "MuRIL: Multilingual Representations for Indian Languages". arXiv:2103.10730.
  26. ^ Yinhan Liu; et al. (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach". arXiv:1907.11692.
  27. ^ Guillaume Lample; Alexis Conneau (2019). "Cross-lingual Language Model Pretraining". arXiv:1901.07291.
  28. ^ a b c "facebookresearch/XLM". GitHub.
  29. ^ Alexis Conneau; et al. (2019). "Unsupervised Cross-lingual Representation Learning at Scale". arXiv:1911.02116.
  30. ^ Zhilin Yang; et al. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237.
  31. ^ Amanda Askell; et al. (2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861.
  32. ^ "Introducing The World's Largest Open Multilingual Language Model: BLOOM". BigScience Blog.
  33. ^ Teven Le Scao; et al. (BigScience Workshop) (2022). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv:2211.05100.
  34. ^ Jordan Hoffmann; et al. (2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556.
  35. ^ "CodeGeeX: A Multilingual Code Generation Model". September 19, 2022.
  36. ^ Erik Nijkamp; et al. (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". arXiv:2203.13474.
  37. ^ Mark Chen; et al. (2021). "Evaluating Large Language Models Trained on Code". arXiv:2107.03374.
  38. ^ Wojciech Zaremba; Greg Brockman; OpenAI (August 10, 2021). "OpenAI Codex".
  39. ^ a b c d "HELM". Stanford CRFM.
  40. ^ Cohere Team (November 15, 2021). "The Cohere Platform is now publicly available". Cohere.
  41. ^ Cohere Team (February 28, 2022). "Cohere launches Extremely Large (beta)". Cohere.
  42. ^ Zhengyan Zhang; et al. (2020). "CPM: A Large-scale Generative Chinese Pre-trained Language Model". arXiv:2012.00413.
  43. ^ "TsinghuaAI/CPM-1-Generate". GitHub.
  44. ^ Yizhe Zhang; et al. (2019). "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". arXiv:1911.00536.
  45. ^ a b Mikel Artetxe; et al. (2021). "Efficient Large Scale Language Modeling with Mixtures of Experts". arXiv:2112.10684.
  46. ^ a b "facebookresearch/fairseq". GitHub.
  47. ^ Ross Taylor; et al. (2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085.
  48. ^ Nan Du; et al. (2021). "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". arXiv:2112.06905.
  49. ^ "GLM-130B: An Open Bilingual Pre-Trained Model". August 4, 2022.
  50. ^ Aohan Zeng; et al. (2022). "GLM-130B: An Open Bilingual Pre-trained Model". arXiv:2210.02414.
  51. ^ Jack W. Rae; et al. (2021). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv:2112.11446.
  52. ^ "Improving Language Understanding by Generative Pre-Training" (PDF). Archived (PDF) from the original on January 26, 2021. Retrieved June 9, 2020.
  53. ^ Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). "Language models are unsupervised multitask learners" (PDF). 1 (8). Archived (PDF) from the original on 6 February 2021. Retrieved 19 December 2020. {{cite journal}}: Cite journal requires |journal= (help)
  54. ^ Brown, Tom B.; et al. (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
  55. ^ "GPT-4". OpenAI. March 14, 2023.
  56. ^ OpenAI. "GPT-4 Technical Report" (PDF).
  57. ^ Sid Black; et al. (2021), GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, doi:10.5281/zenodo.5297715
  58. ^ Sid Black; Stella Biderman; et al. (2022). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". arXiv:2204.06745.
  59. ^ Aran Komatsuzaki (June 4, 2021). "GPT-J-6B: 6B JAX-Based Transformer".
  60. ^ Together (November 29, 2022), Releasing v1 of GPT-JT powered by open-source AI, Together
  61. ^ Magnus Sahlgren (September 22, 2022). "The Nordic Pile".
  62. ^ Daniel Gillblad (January 23, 2023). "GPT-SW3 Pre-release".
  63. ^ Ariel Ekgren; et al. (2022). Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish. Proceedings of the Thirteenth Language Resources and Evaluation Conference.
  64. ^ Rowan Zellers; et al. "Defending Against Neural Fake News". arXiv:1905.12616.
  65. ^ Boseop Kim; et al. (2021). "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". arXiv:2109.04650.
  66. ^ Opher Lieber; Or Sharir; Barak Lenz; Yoav Shoham. "Jurassic-1: Technical details and evaluation" (PDF).
  67. ^ Eli Collins; Zoubin Ghahramani (May 18, 2021). "LaMDA: our breakthrough conversation technology". Google.
  68. ^ Romal Thoppilan; et al. (2021). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239.
  69. ^ Hugo Touvron; et al. (2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv:2302.13971.
  70. ^ "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. February 24, 2023.
  71. ^ @Aleph__Alpha (August 15, 2022). "🌟 Luminous-Supreme is now available! ☑️ After Luminous-Base and Luminous-Extended, Luminous-Supreme is the newest and most powerful generation of our multilingual language models. ▶️ https://lnkd.in/e2Zwq_3V #writtenbyahuman #writtenbyalephalpha" (Tweet) – via Twitter.
  72. ^ Daniel Adiwardana; et al. (2021). "Towards a Human-like Open-Domain Chatbot". arXiv:2001.09977.
  73. ^ Oleh Shliazhko; et al. (2022). "mGPT: Few-Shot Learners Go Multilingual". arXiv:2204.07580.
  74. ^ Siddharth Karamcheti; Laurel Orr (August 26, 2021). "Mistral — A Journey towards Reproducible Language Model Training".
  75. ^ Shaden Smith; et al. (2022). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990.
  76. ^ Susan Zhang; et al. (2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068.
  77. ^ Julien Launay; et al. (2021). "PAGnol: An Extra-Large French Generative Model". arXiv:2110.08554.
  78. ^ "EleutherAI/pythia". GitHub.
  79. ^ Aakanksha Chowdhery; et al. (2022). "PaLM: Scaling Language Modeling with Pathways". arXiv:2204.02311.
  80. ^ Loubna Ben Allal; et al. (2023). "SantaCoder: don't reach for the stars!". arXiv:2301.03988.
  81. ^ Corby Rosset (February 13, 2020). "Turing-NLG: A 17-billion-parameter language model by Microsoft". Microsoft Research Blog.
  82. ^ Feng, Coco (June 2, 2021). "Beijing-funded AI language model tops Google and OpenAI in raw numbers". South China Morning Post.
  83. ^ "Mikhail Khrushchev". Medium. June 23, 2022.
  84. ^ "yandex/YaLM-100B". GitHub.
  85. ^ Shaohua Wu; et al. (2021). "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". arXiv:2110.04725.
  86. ^ Rohan Taori; et al. "Alpaca: A Strong Instruction-Following Model".
  87. ^ Kurt Shuster; et al. "BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage". arXiv:2208.03188.
  88. ^ @CohereAI (November 8, 2022). "🔥 Command Beta now available → a new capability that responds to single-sentence commands (i.e., zero-shot prompts). We have some sample shots to get you started → https://hubs.li/Q01rQq1q0. Sign up to try it out → https://hubs.li/Q01rQfsR0" (Tweet) – via Twitter.
  89. ^ "Command Nightly". Cohere.
  90. ^ Jason Wei; et al. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652v5.
  91. ^ Together (March 10, 2023), Announcing OpenChatKit, Together
  92. ^ a b Long Ouyang; et al. (2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155.
  93. ^ Eli Collins; Zoubin Ghahramani (May 18, 2021). "LaMDA: our breakthrough conversation technology". Google.
  94. ^ Romal Thoppilan; et al. (2021). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239.
  95. ^ Srinivasan Iyer; et al. (2022). "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". arXiv:2212.12017.
  96. ^ Yuntao Bai; et al. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". arXiv:2204.05862.
  97. ^ "THUDM/ChatGLM-6B". GitHub.
  98. ^ "Introducing ChatGPT". OpenAI. November 30, 2022. Retrieved March 15, 2023.