From Wikipedia, the free encyclopedia
Encoder-decoder
Non-finetuned
Name
Creator
Parameters
Languages
Trained on
Announced
Access
References
BART
Facebook
0.4 B
English
RoBERTa
October 29, 2019
Yes
[ 1]
ByT5
Google
13 B
Multilingual101
mC4
May 28, 2021
Yes
[ 2]
ERNIE 3.0
Baidu
10 B
Chinese
ERNIE 3.0
July 5, 2021
Yes
[ 3]
ERNIE 3.0 Titan
Baidu
260 B
Chinese
ERNIE 3.0
December 23, 2021
No
[ 4]
LM-adapted T5
Google
11 B
English
C4 (more training)
April 18, 2021
Yes
[ 5]
mBART
Facebook
0.68 B
Multilingual25
CC25
January 22, 2020
Yes
[ 6]
mT5
Google
13 B
Multilingual101
mC4
October 22, 2020
Yes
[ 7]
T5
Google
11 B
English
C4
February 24, 2020
Yes
[ 8] [ 9]
UL2
Google
20 B
English
C4
May 10, 2022
Yes
[ 10] [ 11]
Finetuned
Name
Creator
Parameters
Languages
Finetuning
Announced
Access
References
Flan-T5
Google
11 B
English
T5 (Instruction)
October 20, 2022
Yes
[ 12]
mT0
BigScience
13 B
Multilingual46
mT5 (Multitask)
November 3, 2022
Yes
[ 13]
T0, T0+, T0++
BigScience
11 B
English
T5+LM (Multitask)
October 15, 2021
Yes
[ 14]
Flan-UL2
Google
20 B
English
UL2 (Instruction)
February 28, 2023
Yes
[ 10] [ 15]
Encoder-only
Non-finetuned
Name
Creator
Parameters
Languages
Trained on
Announced
Access
References
ALBERT
Google
17 M
English
BERT
September 26, 2019
Yes
[ 16]
BERT
Google
340 M
English
BERT (BookCorpus , English Wikipedia )
October 11, 2018
Yes
[ 17] [ 18]
mBERT
Google
172 M
Multilingual104
Wikipedia
November 4, 2018
Yes
[ 19] [ 20]
BERT-Base, Chinese
Google
102 M
Chinese
Chinese Wikipedia
November 4, 2018
Yes
[ 21] [ 20]
BERTIN
BERTIN Project
355 M
Spanish
mC4
July 14, 2022
Yes
[ 22]
DeBERTa
Microsoft
1,500 M
English
BERT + 3 corpora
June 5, 2020
Yes
[ 23]
ELECTRA-Large
Google
334 M
English
XLNet
March 23, 2020
Yes
[ 24]
IndicBERT
AI4Bharat
33 M
Multilingual12
IndicCorp
September 13, 2020
Yes
[ 25]
IndicBERT v2
AI4Bharat
278 M
Multilingual24
IndicCorp v2
November 13, 2022
Yes
[ 26]
MuRIL
Google
236 M
Multilingual17
OSCAR, Wikipedia
March 19, 2021
Yes
[ 27]
RoBERTa
Facebook , University of Washington
355 M
English
BERT + 3 corpora
July 26, 2019
Yes
[ 28]
XLM-15
Facebook
250 M
Multilingual15
Wikipedia
January 22, 2019
Yes
[ 29] [ 30]
XLM-17
Facebook
570 M
Multilingual17
Wikipedia
August 17, 2019
Yes
[ 30]
XLM-100
Facebook
570 M
Multilingual100
Wikipedia
August 17, 2019
Yes
[ 30]
XLM-R
Facebook
550 M
Multilingual100
CommonCrawl
November 5, 2019
Yes
[ 31]
XLNet-Large
Carnegie Mellon University , Google Brain
360 M
English
XLNet (BERT + 3 corpora)
June 19, 2019
Yes
[ 32]
Decoder-only
Non-finetuned
Name
Creator
Parameters
Languages
Trained on
Announced
Access
References
Anthropic-LM
Anthropic
52 B
English
December 1, 2021
No
[ 33]
BLOOM
BigScience
176 B
Multilingual46 + Code13
ROOTS
July 6, 2022
Yes
[ 34] [ 35]
Chinchilla
DeepMind
70 B
English
MassiveText
March 29, 2022
No
[ 36]
CodeGeex
Tsinghua University
13 B
Code20
The Pile, CodeParrot
September 19, 2022
Requestable
[ 37]
CodeGen
Salesforce
16.1 B
Code
GitHub
March 25, 2022
Yes
[ 38]
Codex
OpenAI
12 B
Code
GitHub
July 7, 2021
API
[ 39] [ 40]
Cohere large
Cohere
13.1 B[ 41]
English
November 15, 2021
API
[ 42]
Cohere xlarge
Cohere
52.4 B[ 41]
English
February 28, 2022
API
[ 43]
CPM-1
Tsinghua University
2.6 B
Chinese
December 1, 2020
Yes
[ 44] [ 45]
DialoGPT
Microsoft
0.762 B
English
Reddit
November 1, 2019
Yes
[ 46]
FairSeq Dense
Meta
13 B
English
RoBERTa + CC100
December 20, 2021
Yes
[ 47] [ 48]
FairSeq Sparse
Meta
1,100 BMoE
English
RoBERTa + CC100
December 20, 2021
Requestable
[ 47] [ 48]
Galactica
Meta
120 B
English
Scientific papers, etc.
November 16, 2022
Yes
[ 49]
GLaM
Google
1,200 BMoE
English
News, books, etc.
December 13, 2021
No
[ 50]
GLM-130B
Tsinghua University
130 B
English + Chinese
August 4, 2022
Yes
[ 51] [ 52]
Gopher
DeepMind
280 B
English
MassiveText
December 8, 2021
No
[ 53]
GPT-1
OpenAI
0.117 B
English
BookCorpus
June 11, 2018
Yes
[ 54]
GPT-2
OpenAI
1.558 B
English
WebText
February 14, 2019
Yes
[ 55]
GPT-3
OpenAI
175 B
English
CommonCrawl, WebText2, etc.
May 28, 2020
API
[ 56]
GPT-4
OpenAI
?
English
?
March 14, 2023
Online
[ 57] [ 58]
GPT-Neo
EleutherAI
2.7 B
English
The Pile
March 22, 2021
Yes
[ 59]
GPT-NeoX
EleutherAI
20 B
English
The Pile
April 14, 2022
Yes
[ 60]
GPT-J
EleutherAI
6 B
English
The Pile
June 4, 2021
Yes
[ 61]
GPT-JT
Together
6 B
English
The Pile
November 29, 2022
Yes
[ 62]
GPT-SW3
AI Sweden
20 B
Multilingual5
The Nordic Pile[ 63]
January 23, 2023
Requestable
[ 64]
GPT-SW3 v1
AI Sweden
3.5 B
Swedish
OSCAR, Web, etc.
February 15, 2022
Yes
[ 65]
Grover-Mega
University of Washington
1.5 B
English
RealNews
May 29, 2019
Yes
[ 66]
HyperCLOVA
Naver
82 B
Korean
September 10, 2021
No
[ 67]
J1-Jumbo
AI21 Labs
178 B
English
August 12, 2021
API
[ 68]
LaMDA (PT)
Google
137 B
English
May 18, 2021
No
[ 69] [ 70]
LLaMA
Meta
65 B
English
CommonCrawl, C4, etc.
February 24, 2023
Requestable
[ 71] [ 72]
Luminous Supreme
Aleph Alpha
70 B[ 41]
Multilingual5
August 15, 2022
API
[ 73]
Meena
Google
2.6 B
English
Social media
January 27, 2020
No
[ 74]
mGPT
Sberbank
13 B
Multilingual60
Wikipedia, C4
April 15, 2022
Yes
[ 75]
Mistral
Stanford University
0.335 B
English
OpenWebText
August 26, 2021
Yes
[ 76]
Megatron-Turing NLG
Microsoft , NVIDIA
530 B
English
CommonCrawl
January 28, 2022
No
[ 77]
OPT
Meta
175 B
English
RoBERTa, The Pile, PushShift.io Reddit
May 3, 2022
Online, requestable
[ 78]
PAGnol
LightOn
1.5 B
French
CCNet, OSCAR
October 16, 2021
Online, API
[ 79]
PanGu-α
PanGu-α Team
200 B
Chinese
CommonCrawl, etc.
April 26, 2021
No
[ 80]
Pythia
EleutherAI
12 B
English
The Pile
February 13, 2023
Yes
[ 81]
PaLM
Google
540 B
English
Social media, filtered webpages, etc.
April 5, 2022
No
[ 82]
SantaCoder
BigCode
1.1 B
Code3
The Stack
January 9, 2023
Yes
[ 83]
Turing-NLG
Microsoft
17 B
English
February 13, 2020
No
[ 84]
Wu Dao 2.0
BAAI
1,750 BMoE
English + Chinese (multimodal)
WuDaoCorpora
May 31, 2021
No
[ 85]
YaLM
Yandex
100 B
Russian + English
The Pile, Yandex pages, etc.
June 23, 2022
Yes
[ 86] [ 87]
Yuan 1.0
Inspur
245 B
Chinese
CommonCrawl, etc.
October 10, 2021
No
[ 88]
Finetuned
Name
Creator
Parameters
Languages
Finetuning
Announced
Access
References
Alpaca
Stanford University
7 B
English
LLaMA (Instruction)
March 13, 2023
Reproducible
[ 89]
BlenderBot 3
Meta
175 B
English
OPT (Dialogue)
April 5, 2022
Online
[ 90]
BLOOMZ
BigScience
176 B
Multilingual46 + Code13
BLOOM (Multitask)
November 3, 2022
Yes
[ 13]
Cohere command
Cohere
52.4 B[ 41]
English
Cohere xlarge (Instruction)
November 8, 2022 [ 91]
API
[ 92]
FLAN
Google
137 B
English
LaMDA-PT (Instruction)
September 3, 2021
No
[ 93]
Flan-PaLM
Google
540 B
English
PaLM (Instruction)
October 20, 2022
No
[ 12]
GPT-NeoXT-Chat-Base-20B
Together
20 B
English
GPT-NeoX (Dialogue)
March 10, 2023
Yes
[ 94]
InstructGPT-3 (SFT)
OpenAI
175 B
English
GPT-3 (Instruction)
March 4, 2022
API
[ 95]
LaMDA
Google
137 B
English, multilingual
LaMDA-PT (Dialogue)
May 18, 2021
No
[ 96] [ 97]
OPT-IML
Meta
175 B
English
OPT (Instruction)
December 22, 2022
Requestable
[ 98]
RLHF
These are models that were fine-tuned with reinforcement learning from human feedback (RLHF).
Name
Creator
Parameters
Languages
Trained on
Announced
Access
References
Anthropic-LM v4-s3
Anthropic
52 B[ 41]
English
April 12, 2022
Online, API
[ 99]
InstructGPT-3 (PPO)
OpenAI
175 B
English
March 4, 2022
API
[ 95]
ChatGLM-6B
Tsinghua University
6.2 B
Chinese + English
GLM
March 13, 2023
Yes
[ 100]
ChatGPT
OpenAI
175 B
English, multilingual
CommonCrawl, WebText2, etc.
November 30, 2022
Online, API
[ 101]
References
^ Mike Lewis; et al. (2019). "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". arXiv :1910.13461 .
^ Linting Xue; et al. (2021). "ByT5: Towards a token-free future with pre-trained byte-to-byte models". arXiv :2105.13626 .
^ Yu Sun; et al. (2021). "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv :2107.02137 .
^ Shuohuan Wang; et al. (2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv :2112.12731 .
^ Brian Lester; et al. (2021). "The Power of Scale for Parameter-Efficient Prompt Tuning". arXiv :2104.08691 .
^ Yinhan Liu; et al. (2020). "Multilingual Denoising Pre-training for Neural Machine Translation". arXiv :2001.08210 .
^ Linting Xue; et al. (2020). "mT5: A massively multilingual pre-trained text-to-text transformer". arXiv :2010.11934 .
^ Adam Roberts; Collin Raffel (February 24, 2020). "Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer" .
^ Colin Raffel; et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (PDF) . Journal of Machine Learning Research . 21 : 1–67.
^ a b Yi Tay; et al. (2022). "UL2: Unifying Language Learning Paradigms". arXiv :2205.05131 .
^ Yi Tay; Mostafa Dehghani (October 14, 2022). "UL2 20B: An Open Source Unified Language Learner" . Google.
^ a b Hyung Won Chung; et al. (2022). "Scaling Instruction-Finetuned Language Models". arXiv :2210.11416 .
^ a b Niklas Muennighoff; et al. (2022). "Crosslingual Generalization through Multitask Finetuning". arXiv :2211.01786 .
^ Victor Sanh; et al. (2021). "Multitask Prompted Training Enables Zero-Shot Task Generalization". arXiv :2110.08207 .
^ Yi Tay (March 3, 2023). "A New Open Source Flan 20B with UL2" .
^ Zhenzhong Lan; et al. (2019). "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". arXiv :1909.11942 .
^ Jacob Devlin; Ming-Wei Chang; Kenton Lee; Kristina Toutanova (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv :1810.04805v2 .
^ Jacob Devlin; Ming-Wei Chang (November 2, 2018). "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing" . Google.
^ "bert-base-multilingual-cased" . HuggingFace .
^ a b "bert/multilingual.md" . google-research/bert .
^ "bert-base-chinese" . HuggingFace .
^ Javier de la Rosa; et al. (2022). "BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling". arXiv :2207.06814 .
^ Pengcheng He; et al. (2020). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". arXiv :2006.03654 .
^ Kevin Clark; et al. (2020). "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". arXiv :2003.10555 .
^ Divyanshu Kakwani; et al. (2020). IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages . Findings of the Association for Computational Linguistics. Association for Computational Linguistics.
^ Sumanth Doddapaneni; et al. (2022). "IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages". arXiv :2212.05409 .
^ Simran Khanuja; et al. (2021). "MuRIL: Multilingual Representations for Indian Languages". arXiv :2103.10730 .
^ Yinhan Liu; et al. (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach". arXiv :1907.11692 .
^ Guillaume Lample; Alexis Conneau (2019). "Cross-lingual Language Model Pretraining". arXiv :1901.07291 .
^ a b c "facebookresearch/XLM" . GitHub.
^ Alexis Conneau; et al. (2019). "Unsupervised Cross-lingual Representation Learning at Scale". arXiv :1911.02116 .
^ Zhilin Yang; et al. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv :1906.08237 .
^ Amanda Askell; et al. (2021). "A General Language Assistant as a Laboratory for Alignment". arXiv :2112.00861 .
^ "Introducing The World's Largest Open Multilingual Language Model: BLOOM" . BigScience Blog .
^ Teven Le Scao; et al. (BigScience Workshop) (2022). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv :2211.05100 .
^ Jordan Hoffmann; et al. (2022). "Training Compute-Optimal Large Language Models". arXiv :2203.15556 .
^ "CodeGeeX: A Multilingual Code Generation Model" . September 19, 2022.
^ Erik Nijkamp; et al. (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". arXiv :2203.13474 .
^ Mark Chen; et al. (2021). "Evaluating Large Language Models Trained on Code". arXiv :2107.03374 .
^ Wojciech Zaremba; Greg Brockman; OpenAI (August 10, 2021). "OpenAI Codex" .
^ a b c d e "HELM" . Stanford CRFM.
^ Cohere Team (November 15, 2021). "The Cohere Platform is now publicly available" . Cohere.
^ Cohere Team (February 28, 2022). "Cohere launches Extremely Large (beta)" . Cohere.
^ Zhengyan Zhang; et al. (2020). "CPM: A Large-scale Generative Chinese Pre-trained Language Model". arXiv :2012.00413 .
^ "TsinghuaAI/CPM-1-Generate" . GitHub.
^ Yizhe Zhang; et al. (2019). "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". arXiv :1911.00536 .
^ a b Mikel Artetxe; et al. (2021). "Efficient Large Scale Language Modeling with Mixtures of Experts". arXiv :2112.10684 .
^ a b "facebookresearch/fairseq" . GitHub.
^ Ross Taylor; et al. (2022). "Galactica: A Large Language Model for Science". arXiv :2211.09085 .
^ Nan Du; et al. (2021). "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". arXiv :2112.06905 .
^ "GLM-130B: An Open Bilingual Pre-Trained Model" . August 4, 2022.
^ Aohan Zeng; et al. (2022). "GLM-130B: An Open Bilingual Pre-trained Model". arXiv :2210.02414 .
^ Jack W. Rae; et al. (2021). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv :2112.11446 .
^ "Improving Language Understanding by Generative Pre-Training" (PDF) . Archived (PDF) from the original on January 26, 2021. Retrieved June 9, 2020 .
^ Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). "Language models are unsupervised multitask learners" (PDF) . 1 (8). Archived (PDF) from the original on 6 February 2021. Retrieved 19 December 2020 .
^ Brown, Tom B.; et al. (2020). "Language Models are Few-Shot Learners". arXiv :2005.14165 [cs.CL ].
^ "GPT-4" . OpenAI. March 14, 2023.
^ OpenAI. "GPT-4 Technical Report" (PDF) .
^ Sid Black; et al. (2021), GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , doi :10.5281/zenodo.5297715
^ Sid Black; Stella Biderman; et al. (2022). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". arXiv :2204.06745 .
^ Aran Komatsuzaki (June 4, 2021). "GPT-J-6B: 6B JAX-Based Transformer" .
^ Together (November 29, 2022), Releasing v1 of GPT-JT powered by open-source AI , Together
^ Magnus Sahlgren (September 22, 2022). "The Nordic Pile" .
^ Daniel Gillblad (January 23, 2023). "GPT-SW3 Pre-release" .
^ Ariel Ekgren; et al. (2022). Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish . Proceedings of the Thirteenth Language Resources and Evaluation Conference.
^ Rowan Zellers; et al. "Defending Against Neural Fake News". arXiv :1905.12616 .
^ Boseop Kim; et al. (2021). "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". arXiv :2109.04650 .
^ Opher Lieber; Or Sharir; Barak Lenz; Yoav Shoham. "Jurassic-1: Technical details and evaluation" (PDF) .
^ Eli Collins; Zoubin Ghahramani (May 18, 2021). "LaMDA: our breakthrough conversation technology" . Google.
^ Romal Thoppilan; et al. (2021). "LaMDA: Language Models for Dialog Applications". arXiv :2201.08239 .
^ Hugo Touvron; et al. (2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv :2302.13971 .
^ "Introducing LLaMA: A foundational, 65-billion-parameter large language model" . Meta AI. February 24, 2023.
^ @Aleph__Alpha (August 15, 2022). "🌟 Luminous-Supreme is now available! ☑️ After Luminous-Base and Luminous-Extended, Luminous-Supreme is the newest and most powerful generation of our multilingual language models. ▶️ https://lnkd.in/e2Zwq_3V #writtenbyahuman #writtenbyalephalpha" (Tweet ) – via Twitter .
^ Daniel Adiwardana; et al. (2021). "Towards a Human-like Open-Domain Chatbot". arXiv :2001.09977 .
^ Oleh Shliazhko; et al. (2022). "mGPT: Few-Shot Learners Go Multilingual". arXiv :2204.07580 .
^ Siddharth Karamcheti; Laurel Orr (August 26, 2021). "Mistral — A Journey towards Reproducible Language Model Training" .
^ Shaden Smith; et al. (2022). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv :2201.11990 .
^ Susan Zhang; et al. (2022). "OPT: Open Pre-trained Transformer Language Models". arXiv :2205.01068 .
^ Julien Launay; et al. (2021). "PAGnol: An Extra-Large French Generative Model". arXiv :2110.08554 .
^ Wei Zeng; et al. (2021). "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". arXiv :2104.12369 .
^ "EleutherAI/pythia" . GitHub.
^ Aakanksha Chowdhery; et al. (2022). "PaLM: Scaling Language Modeling with Pathways". arXiv :2204.02311 .
^ Loubna Ben Allal; et al. (2023). "SantaCoder: don't reach for the stars!". arXiv :2301.03988 .
^ Corby Rosset (February 13, 2020). "Turing-NLG: A 17-billion-parameter language model by Microsoft" . Microsoft Research Blog .
^ Feng, Coco (June 2, 2021). "Beijing-funded AI language model tops Google and OpenAI in raw numbers" . South China Morning Post .
^ "Mikhail Khrushchev" . Medium. June 23, 2022.
^ "yandex/YaLM-100B" . GitHub.
^ Shaohua Wu; et al. (2021). "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". arXiv :2110.04725 .
^ Rohan Taori; et al. "Alpaca: A Strong Instruction-Following Model" .
^ Kurt Shuster; et al. "BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage". arXiv :2208.03188 .
^ @CohereAI (November 8, 2022). "🔥 Command Beta now available → a new capability that responds to single-sentence commands (i.e., zero-shot prompts). We have some sample shots to get you started → https://hubs.li/Q01rQq1q0. Sign up to try it out → https://hubs.li/Q01rQfsR0" (Tweet ) – via Twitter .
^ "Command Nightly" . Cohere.
^ Jason Wei; et al. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv :2109.01652v5 .
^ Together (March 10, 2023), Announcing OpenChatKit , Together
^ a b Long Ouyang; et al. (2022). "Training language models to follow instructions with human feedback". arXiv :2203.02155 .
^ Eli Collins; Zoubin Ghahramani (May 18, 2021). "LaMDA: our breakthrough conversation technology" . Google.
^ Romal Thoppilan; et al. (2021). "LaMDA: Language Models for Dialog Applications". arXiv :2201.08239 .
^ Srinivasan Iyer; et al. (2022). "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". arXiv :2212.12017 .
^ Yuntao Bai; et al. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". arXiv :2204.05862 .
^ "THUDM/ChatGLM-6B" . GitHub.
^ "Introducing ChatGPT" . OpenAI. November 30, 2022. Retrieved March 15, 2023 .