User:AryamanA/List of large language models: Difference between revisions
Appearance
Content deleted Content added
Line 104: | Line 104: | ||
! Access |
! Access |
||
! References |
! References |
||
|- |
|||
| ATLAS |
|||
| [[Meta AI|Meta]] |
|||
| {{nts|11}}B |
|||
| English |
|||
| T5+LM (Retrieval-augmented) |
|||
| {{dts|2022-08-05}} |
|||
| {{no}} |
|||
| <ref>{{cite arxiv|eprint=2208.03299v3|author=Gautier Izacard|title=Atlas: Few-shot Learning with Retrieval Augmented Language Models|year=2022|display-authors=etal}}</ref> |
|||
| |
|||
|- |
|- |
||
| Flan-T5 |
| Flan-T5 |
||
Line 122: | Line 132: | ||
| {{yes}} |
| {{yes}} |
||
| <ref name="bloomz">{{cite arxiv|eprint=2211.01786|title=Crosslingual Generalization through Multitask Finetuning|author=Niklas Muennighoff|display-authors=etal|year=2022}}</ref> |
| <ref name="bloomz">{{cite arxiv|eprint=2211.01786|title=Crosslingual Generalization through Multitask Finetuning|author=Niklas Muennighoff|display-authors=etal|year=2022}}</ref> |
||
|- |
|||
| mTk-Instruct |
|||
| [[Allen Institute for AI]] and others |
|||
| {{nts|13}}B |
|||
| Multilingual<sup>54</sup> |
|||
| mT5-XXL (Instruction) |
|||
| {{dts|2022-04-16}} |
|||
| {{yes}} |
|||
| <ref name="tk-instruct">{{cite arxiv|eprint=2204.07705|title=Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks|author=Yizhong Wang|year=2022|display-authors=etal}}</ref> |
|||
|- |
|- |
||
| T0, T0+, T0++ |
| T0, T0+, T0++ |
||
Line 131: | Line 150: | ||
| {{yes}} |
| {{yes}} |
||
| <ref>{{cite arxiv|eprint=2110.08207|title=Multitask Prompted Training Enables Zero-Shot Task Generalization|author=Victor Sanh|display-authors=etal|year=2021}}</ref> |
| <ref>{{cite arxiv|eprint=2110.08207|title=Multitask Prompted Training Enables Zero-Shot Task Generalization|author=Victor Sanh|display-authors=etal|year=2021}}</ref> |
||
|- |
|||
| Tk-Instruct |
|||
| [[Allen Institute for AI]] and others |
|||
| {{nts|11}}B |
|||
| English |
|||
| T5-XXL (Instruction) |
|||
| {{dts|2022-04-16}} |
|||
| {{yes}} |
|||
| <ref name="tk-instruct"/> |
|||
|- |
|- |
||
| Flan-UL2 |
| Flan-UL2 |
Revision as of 19:23, 18 March 2023
Encoder-decoder
Non-finetuned
Name | Creator | Parameters | Languages | Trained on | Announced | Access | References |
---|---|---|---|---|---|---|---|
BART | 0.4B | English | RoBERTa | October 29, 2019 | Yes | [1] | |
ByT5 | 13B | Multilingual101 | mC4 | May 28, 2021 | Yes | [2] | |
ERNIE 3.0 | Baidu | 10B | Chinese | ERNIE 3.0 | July 5, 2021 | Yes | [3] |
ERNIE 3.0 Titan | Baidu | 260B | Chinese | ERNIE 3.0 | December 23, 2021 | No | [4] |
LM-adapted T5 | 11B | English | C4 (more training) | April 18, 2021 | Yes | [5] | |
mBART | 0.68B | Multilingual25 | CC25 | January 22, 2020 | Yes | [6] | |
mT5 | 13B | Multilingual101 | mC4 | October 22, 2020 | Yes | [7] | |
T5 | 11B | English | C4 | February 24, 2020 | Yes | [8][9] | |
UL2 | 20B | English | C4 | May 10, 2022 | Yes | [10][11] |
Finetuned
Name | Creator | Parameters | Languages | Finetuning | Announced | Access | References | |
---|---|---|---|---|---|---|---|---|
ATLAS | Meta | 11B | English | T5+LM (Retrieval-augmented) | August 5, 2022 | No | [12] | |
Flan-T5 | 11B | English | T5 (Instruction) | October 20, 2022 | Yes | [13] | ||
mT0 | BigScience | 13B | Multilingual46 | mT5 (Multitask) | November 3, 2022 | Yes | [14] | |
mTk-Instruct | Allen Institute for AI and others | 13B | Multilingual54 | mT5-XXL (Instruction) | April 16, 2022 | Yes | [15] | |
T0, T0+, T0++ | BigScience | 11B | English | T5+LM (Multitask) | October 15, 2021 | Yes | [16] | |
Tk-Instruct | Allen Institute for AI and others | 11B | English | T5-XXL (Instruction) | April 16, 2022 | Yes | [15] | |
Flan-UL2 | 20B | English | UL2 (Instruction) | February 28, 2023 | Yes | [10][17] |
Encoder-only
Non-finetuned
Name | Creator | Parameters | Languages | Trained on | Announced | Access | References |
---|---|---|---|---|---|---|---|
ALBERT | 17M | English | BERT | September 26, 2019 | Yes | [18] | |
BERT | 340M | English | BERT (BookCorpus, English Wikipedia) | October 11, 2018 | Yes | [19][20] | |
mBERT | 172M | Multilingual104 | Wikipedia | November 4, 2018 | Yes | [21][22] | |
BERT-Base, Chinese | 102M | Chinese | Chinese Wikipedia | November 4, 2018 | Yes | [23][22] | |
BERTIN | BERTIN Project | 355M | Spanish | mC4 | July 14, 2022 | Yes | [24] |
DeBERTa | Microsoft | 1,500M | English | BERT + 3 corpora | June 5, 2020 | Yes | [25] |
ELECTRA-Large | 334M | English | XLNet | March 23, 2020 | Yes | [26] | |
IndicBERT | AI4Bharat | 33M | Multilingual12 | IndicCorp | September 13, 2020 | Yes | [27] |
IndicBERT v2 | AI4Bharat | 278M | Multilingual24 | IndicCorp v2 | November 13, 2022 | Yes | [28] |
MuRIL | 236M | Multilingual17 | OSCAR, Wikipedia | March 19, 2021 | Yes | [29] | |
RoBERTa | Facebook, University of Washington | 355M | English | BERT + 3 corpora | July 26, 2019 | Yes | [30] |
XLM-15 | 250M | Multilingual15 | Wikipedia | January 22, 2019 | Yes | [31][32] | |
XLM-17 | 570M | Multilingual17 | Wikipedia | August 17, 2019 | Yes | [32] | |
XLM-100 | 570M | Multilingual100 | Wikipedia | August 17, 2019 | Yes | [32] | |
XLM-R | 550M | Multilingual100 | CommonCrawl | November 5, 2019 | Yes | [33] | |
XLNet-Large | Carnegie Mellon University, Google Brain | 360M | English | XLNet (BERT + 3 corpora) | June 19, 2019 | Yes | [34] |
Decoder-only
Non-finetuned
Name | Creator | Parameters | Languages | Trained on | Announced | Access | References |
---|---|---|---|---|---|---|---|
Anthropic-LM | Anthropic | 52B | English | December 1, 2021 | No | [35] | |
BLOOM | BigScience | 176B | Multilingual46 + Code13 | ROOTS | July 6, 2022 | Yes | [36][37] |
Chinchilla | DeepMind | 70B | English | MassiveText | March 29, 2022 | No | [38] |
CodeGeex | Tsinghua University | 13B | Code20 | The Pile, CodeParrot | September 19, 2022 | Requestable | [39] |
CodeGen | Salesforce | 16.1B | Code | GitHub | March 25, 2022 | Yes | [40] |
Codex | OpenAI | 12B | Code | GitHub | July 7, 2021 | API | [41][42] |
Cohere large | Cohere | 13.1B[43] | English | November 15, 2021 | API | [44] | |
Cohere xlarge | Cohere | 52.4B[43] | English | February 28, 2022 | API | [45] | |
CPM-1 | Tsinghua University | 2.6B | Chinese | December 1, 2020 | Yes | [46][47] | |
DialoGPT | Microsoft | 0.762B | English | November 1, 2019 | Yes | [48] | |
FairSeq Dense | Meta | 13B | English | RoBERTa + CC100 | December 20, 2021 | Yes | [49][50] |
FairSeq Sparse | Meta | 1,100BMoE | English | RoBERTa + CC100 | December 20, 2021 | Requestable | [49][50] |
Galactica | Meta | 120B | English | Scientific papers, etc. | November 16, 2022 | Yes | [51] |
GLaM | 1,200BMoE | English | News, books, etc. | December 13, 2021 | No | [52] | |
GLM-130B | Tsinghua University | 130B | English + Chinese | August 4, 2022 | Yes | [53][54] | |
Gopher | DeepMind | 280B | English | MassiveText | December 8, 2021 | No | [55] |
GPT-1 | OpenAI | 0.117B | English | BookCorpus | June 11, 2018 | Yes | [56] |
GPT-2 | OpenAI | 1.558B | English | WebText | February 14, 2019 | Yes | [57] |
GPT-3 | OpenAI | 175B | English | CommonCrawl, WebText2, etc. | May 28, 2020 | API | [58] |
GPT-4 | OpenAI | ? | English | ? | March 14, 2023 | Online | [59][60] |
GPT-Neo | EleutherAI | 2.7B | English | The Pile | March 22, 2021 | Yes | [61] |
GPT-NeoX | EleutherAI | 20B | English | The Pile | April 14, 2022 | Yes | [62] |
GPT-J | EleutherAI | 6B | English | The Pile | June 4, 2021 | Yes | [63] |
GPT-JT | Together | 6B | English | The Pile | November 29, 2022 | Yes | [64] |
GPT-SW3 | AI Sweden | 20B | Multilingual5 | The Nordic Pile[65] | January 23, 2023 | Requestable | [66] |
GPT-SW3 v1 | AI Sweden | 3.5B | Swedish | OSCAR, Web, etc. | February 15, 2022 | Yes | [67] |
Grover-Mega | University of Washington | 1.5B | English | RealNews | May 29, 2019 | Yes | [68] |
HyperCLOVA | Naver | 82B | Korean | September 10, 2021 | No | [69] | |
J1-Jumbo | AI21 Labs | 178B | English | August 12, 2021 | API | [70] | |
LaMDA (PT) | 137B | English | May 18, 2021 | No | [71][72] | ||
LLaMA | Meta | 65B | English | CommonCrawl, C4, etc. | February 24, 2023 | Requestable | [73][74] |
Luminous Supreme | Aleph Alpha | 70B[43] | Multilingual5 | August 15, 2022 | API | [75] | |
Meena | 2.6B | English | Social media | January 27, 2020 | No | [76] | |
mGPT | Sberbank | 13B | Multilingual60 | Wikipedia, C4 | April 15, 2022 | Yes | [77] |
Mistral | Stanford University | 0.335B | English | OpenWebText | August 26, 2021 | Yes | [78] |
Megatron-Turing NLG | Microsoft, NVIDIA | 530B | English | CommonCrawl | January 28, 2022 | No | [79] |
OPT | Meta | 175B | English | RoBERTa, The Pile, PushShift.io Reddit | May 3, 2022 | Online, requestable | [80] |
PAGnol | LightOn | 1.5B | French | CCNet, OSCAR | October 16, 2021 | Online, API | [81] |
PanGu-α | PanGu-α Team | 200B | Chinese | CommonCrawl, etc. | April 26, 2021 | No | [82] |
Pythia | EleutherAI | 12B | English | The Pile | February 13, 2023 | Yes | [83] |
PaLM | 540B | English | Social media, filtered webpages, etc. | April 5, 2022 | No | [84] | |
SantaCoder | BigCode | 1.1B | Code3 | The Stack | January 9, 2023 | Yes | [85] |
Turing-NLG | Microsoft | 17B | English | February 13, 2020 | No | [86] | |
U-PaLM | 540B | English | Social media, filtered webpages, etc. | October 22, 2022 | No | [87] | |
Wu Dao 2.0 | BAAI | 1,750BMoE | English + Chinese (multimodal) | WuDaoCorpora | May 31, 2021 | No | [88] |
YaLM | Yandex | 100B | Russian + English | The Pile, Yandex pages, etc. | June 23, 2022 | Yes | [89][90] |
Yuan 1.0 | Inspur | 245B | Chinese | CommonCrawl, etc. | October 10, 2021 | No | [91] |
Finetuned
Name | Creator | Parameters | Languages | Finetuning | Announced | Access | References |
---|---|---|---|---|---|---|---|
Alpaca | Stanford University | 7B | English | LLaMA (Instruction) | March 13, 2023 | Reproducible | [92] |
BlenderBot 3 | Meta | 175B | English | OPT (Dialogue) | April 5, 2022 | Online | [93] |
BLOOMZ | BigScience | 176B | Multilingual46 + Code13 | BLOOM (Multitask) | November 3, 2022 | Yes | [14] |
Cohere command | Cohere | 52.4B[43] | English | Cohere xlarge (Instruction) | November 8, 2022[94] | API | [95] |
FLAN | 137B | English | LaMDA-PT (Instruction) | September 3, 2021 | No | [96] | |
Flan-PaLM | 540B | English | PaLM (Instruction) | October 20, 2022 | No | [13] | |
Flan-U-PaLM | 540B | English | U-PaLM (Instruction) | October 20, 2022 | No | [13] | |
GPT-NeoXT-Chat-Base-20B | Together | 20B | English | GPT-NeoX (Dialogue) | March 10, 2023 | Yes | [97] |
InstructGPT-3 (SFT) | OpenAI | 175B | English | GPT-3 (Instruction) | March 4, 2022 | API | [98] |
LaMDA | 137B | English, multilingual | LaMDA-PT (Dialogue) | May 18, 2021 | No | [99][100] | |
OPT-IML | Meta | 175B | English | OPT (Instruction) | December 22, 2022 | Requestable | [101] |
RLHF
These are models that were fine-tuned with reinforcement learning from human feedback (RLHF).
Name | Creator | Parameters | Languages | Trained on | Announced | Access | References |
---|---|---|---|---|---|---|---|
Anthropic-LM v4-s3 | Anthropic | 52B[43] | English | April 12, 2022 | Online, API | [102] | |
InstructGPT-3 (PPO) | OpenAI | 175B | English | March 4, 2022 | API | [98] | |
ChatGLM-6B | Tsinghua University | 6.2B | Chinese + English | GLM | March 13, 2023 | Yes | [103] |
ChatGPT | OpenAI | 175B | English, multilingual | CommonCrawl, WebText2, etc. | November 30, 2022 | Online, API | [104] |
References
- ^ Mike Lewis; et al. (2019). "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". arXiv:1910.13461.
- ^ Linting Xue; et al. (2021). "ByT5: Towards a token-free future with pre-trained byte-to-byte models". arXiv:2105.13626.
- ^ Yu Sun; et al. (2021). "ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2107.02137.
- ^ Shuohuan Wang; et al. (2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation". arXiv:2112.12731.
- ^ Brian Lester; et al. (2021). "The Power of Scale for Parameter-Efficient Prompt Tuning". arXiv:2104.08691.
- ^ Yinhan Liu; et al. (2020). "Multilingual Denoising Pre-training for Neural Machine Translation". arXiv:2001.08210.
- ^ Linting Xue; et al. (2020). "mT5: A massively multilingual pre-trained text-to-text transformer". arXiv:2010.11934.
- ^ Adam Roberts; Collin Raffel (February 24, 2020). "Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer".
- ^ Colin Raffel; et al. (2020). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer" (PDF). Journal of Machine Learning Research. 21: 1–67.
- ^ a b Yi Tay; et al. (2022). "UL2: Unifying Language Learning Paradigms". arXiv:2205.05131.
- ^ Yi Tay; Mostafa Dehghani (October 14, 2022). "UL2 20B: An Open Source Unified Language Learner". Google.
- ^ Gautier Izacard; et al. (2022). "Atlas: Few-shot Learning with Retrieval Augmented Language Models". arXiv:2208.03299v3.
- ^ a b c Hyung Won Chung; et al. (2022). "Scaling Instruction-Finetuned Language Models". arXiv:2210.11416.
- ^ a b Niklas Muennighoff; et al. (2022). "Crosslingual Generalization through Multitask Finetuning". arXiv:2211.01786.
- ^ a b Yizhong Wang; et al. (2022). "Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks". arXiv:2204.07705.
- ^ Victor Sanh; et al. (2021). "Multitask Prompted Training Enables Zero-Shot Task Generalization". arXiv:2110.08207.
- ^ Yi Tay (March 3, 2023). "A New Open Source Flan 20B with UL2".
- ^ Zhenzhong Lan; et al. (2019). "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations". arXiv:1909.11942.
- ^ Jacob Devlin; Ming-Wei Chang; Kenton Lee; Kristina Toutanova (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2.
- ^ Jacob Devlin; Ming-Wei Chang (November 2, 2018). "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google.
- ^ "bert-base-multilingual-cased". HuggingFace.
- ^ a b "bert/multilingual.md". google-research/bert.
- ^ "bert-base-chinese". HuggingFace.
- ^ Javier de la Rosa; et al. (2022). "BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling". arXiv:2207.06814.
- ^ Pengcheng He; et al. (2020). "DeBERTa: Decoding-enhanced BERT with Disentangled Attention". arXiv:2006.03654.
- ^ Kevin Clark; et al. (2020). "ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators". arXiv:2003.10555.
- ^ Divyanshu Kakwani; et al. (2020). IndicNLPSuite: Monolingual Corpora, Evaluation Benchmarks and Pre-trained Multilingual Language Models for Indian Languages. Findings of the Association for Computational Linguistics. Association for Computational Linguistics.
- ^ Sumanth Doddapaneni; et al. (2022). "IndicXTREME: A Multi-Task Benchmark For Evaluating Indic Languages". arXiv:2212.05409.
- ^ Simran Khanuja; et al. (2021). "MuRIL: Multilingual Representations for Indian Languages". arXiv:2103.10730.
- ^ Yinhan Liu; et al. (2019). "RoBERTa: A Robustly Optimized BERT Pretraining Approach". arXiv:1907.11692.
- ^ Guillaume Lample; Alexis Conneau (2019). "Cross-lingual Language Model Pretraining". arXiv:1901.07291.
- ^ a b c "facebookresearch/XLM". GitHub.
- ^ Alexis Conneau; et al. (2019). "Unsupervised Cross-lingual Representation Learning at Scale". arXiv:1911.02116.
- ^ Zhilin Yang; et al. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding". arXiv:1906.08237.
- ^ Amanda Askell; et al. (2021). "A General Language Assistant as a Laboratory for Alignment". arXiv:2112.00861.
- ^ "Introducing The World's Largest Open Multilingual Language Model: BLOOM". BigScience Blog.
- ^ Teven Le Scao; et al. (BigScience Workshop) (2022). "BLOOM: A 176B-Parameter Open-Access Multilingual Language Model". arXiv:2211.05100.
- ^ Jordan Hoffmann; et al. (2022). "Training Compute-Optimal Large Language Models". arXiv:2203.15556.
- ^ "CodeGeeX: A Multilingual Code Generation Model". September 19, 2022.
- ^ Erik Nijkamp; et al. (2022). "CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis". arXiv:2203.13474.
- ^ Mark Chen; et al. (2021). "Evaluating Large Language Models Trained on Code". arXiv:2107.03374.
- ^ Wojciech Zaremba; Greg Brockman; OpenAI (August 10, 2021). "OpenAI Codex".
- ^ a b c d e "HELM". Stanford CRFM.
- ^ Cohere Team (November 15, 2021). "The Cohere Platform is now publicly available". Cohere.
- ^ Cohere Team (February 28, 2022). "Cohere launches Extremely Large (beta)". Cohere.
- ^ Zhengyan Zhang; et al. (2020). "CPM: A Large-scale Generative Chinese Pre-trained Language Model". arXiv:2012.00413.
- ^ "TsinghuaAI/CPM-1-Generate". GitHub.
- ^ Yizhe Zhang; et al. (2019). "DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation". arXiv:1911.00536.
- ^ a b Mikel Artetxe; et al. (2021). "Efficient Large Scale Language Modeling with Mixtures of Experts". arXiv:2112.10684.
- ^ a b "facebookresearch/fairseq". GitHub.
- ^ Ross Taylor; et al. (2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085.
- ^ Nan Du; et al. (2021). "GLaM: Efficient Scaling of Language Models with Mixture-of-Experts". arXiv:2112.06905.
- ^ "GLM-130B: An Open Bilingual Pre-Trained Model". August 4, 2022.
- ^ Aohan Zeng; et al. (2022). "GLM-130B: An Open Bilingual Pre-trained Model". arXiv:2210.02414.
- ^ Jack W. Rae; et al. (2021). "Scaling Language Models: Methods, Analysis & Insights from Training Gopher". arXiv:2112.11446.
- ^ "Improving Language Understanding by Generative Pre-Training" (PDF). Archived (PDF) from the original on January 26, 2021. Retrieved June 9, 2020.
- ^ Radford, Alec; Wu, Jeffrey; Child, Rewon; Luan, David; Amodei, Dario; Sutskever, Ilua (14 February 2019). "Language models are unsupervised multitask learners" (PDF). 1 (8). Archived (PDF) from the original on 6 February 2021. Retrieved 19 December 2020.
{{cite journal}}
: Cite journal requires|journal=
(help) - ^ Brown, Tom B.; et al. (2020). "Language Models are Few-Shot Learners". arXiv:2005.14165 [cs.CL].
- ^ "GPT-4". OpenAI. March 14, 2023.
- ^ OpenAI. "GPT-4 Technical Report" (PDF).
- ^ Sid Black; et al. (2021), GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow, doi:10.5281/zenodo.5297715
- ^ Sid Black; Stella Biderman; et al. (2022). "GPT-NeoX-20B: An Open-Source Autoregressive Language Model". arXiv:2204.06745.
- ^ Aran Komatsuzaki (June 4, 2021). "GPT-J-6B: 6B JAX-Based Transformer".
- ^ Together (November 29, 2022), Releasing v1 of GPT-JT powered by open-source AI, Together
- ^ Magnus Sahlgren (September 22, 2022). "The Nordic Pile".
- ^ Daniel Gillblad (January 23, 2023). "GPT-SW3 Pre-release".
- ^ Ariel Ekgren; et al. (2022). Lessons Learned from GPT-SW3: Building the First Large-Scale Generative Language Model for Swedish. Proceedings of the Thirteenth Language Resources and Evaluation Conference.
- ^ Rowan Zellers; et al. "Defending Against Neural Fake News". arXiv:1905.12616.
- ^ Boseop Kim; et al. (2021). "What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers". arXiv:2109.04650.
- ^ Opher Lieber; Or Sharir; Barak Lenz; Yoav Shoham. "Jurassic-1: Technical details and evaluation" (PDF).
- ^ Eli Collins; Zoubin Ghahramani (May 18, 2021). "LaMDA: our breakthrough conversation technology". Google.
- ^ Romal Thoppilan; et al. (2021). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239.
- ^ Hugo Touvron; et al. (2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv:2302.13971.
- ^ "Introducing LLaMA: A foundational, 65-billion-parameter large language model". Meta AI. February 24, 2023.
- ^ @Aleph__Alpha (August 15, 2022). "🌟 Luminous-Supreme is now available! ☑️ After Luminous-Base and Luminous-Extended, Luminous-Supreme is the newest and most powerful generation of our multilingual language models. ▶️ https://lnkd.in/e2Zwq_3V #writtenbyahuman #writtenbyalephalpha" (Tweet) – via Twitter.
- ^ Daniel Adiwardana; et al. (2021). "Towards a Human-like Open-Domain Chatbot". arXiv:2001.09977.
- ^ Oleh Shliazhko; et al. (2022). "mGPT: Few-Shot Learners Go Multilingual". arXiv:2204.07580.
- ^ Siddharth Karamcheti; Laurel Orr (August 26, 2021). "Mistral — A Journey towards Reproducible Language Model Training".
- ^ Shaden Smith; et al. (2022). "Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model". arXiv:2201.11990.
- ^ Susan Zhang; et al. (2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068.
- ^ Julien Launay; et al. (2021). "PAGnol: An Extra-Large French Generative Model". arXiv:2110.08554.
- ^ Wei Zeng; et al. (2021). "PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation". arXiv:2104.12369.
- ^ "EleutherAI/pythia". GitHub.
- ^ Aakanksha Chowdhery; et al. (2022). "PaLM: Scaling Language Modeling with Pathways". arXiv:2204.02311.
- ^ Loubna Ben Allal; et al. (2023). "SantaCoder: don't reach for the stars!". arXiv:2301.03988.
- ^ Corby Rosset (February 13, 2020). "Turing-NLG: A 17-billion-parameter language model by Microsoft". Microsoft Research Blog.
- ^ Yi Tay; et al. (2022). "Transcending Scaling Laws with 0.1% Extra Compute". arXiv:2210.11399.
- ^ Feng, Coco (June 2, 2021). "Beijing-funded AI language model tops Google and OpenAI in raw numbers". South China Morning Post.
- ^ "Mikhail Khrushchev". Medium. June 23, 2022.
- ^ "yandex/YaLM-100B". GitHub.
- ^ Shaohua Wu; et al. (2021). "Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning". arXiv:2110.04725.
- ^ Rohan Taori; et al. "Alpaca: A Strong Instruction-Following Model".
- ^ Kurt Shuster; et al. "BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage". arXiv:2208.03188.
- ^ @CohereAI (November 8, 2022). "🔥 Command Beta now available → a new capability that responds to single-sentence commands (i.e., zero-shot prompts). We have some sample shots to get you started → https://hubs.li/Q01rQq1q0. Sign up to try it out → https://hubs.li/Q01rQfsR0" (Tweet) – via Twitter.
- ^ "Command Nightly". Cohere.
- ^ Jason Wei; et al. (2021). "Finetuned Language Models Are Zero-Shot Learners". arXiv:2109.01652v5.
- ^ Together (March 10, 2023), Announcing OpenChatKit, Together
- ^ a b Long Ouyang; et al. (2022). "Training language models to follow instructions with human feedback". arXiv:2203.02155.
- ^ Eli Collins; Zoubin Ghahramani (May 18, 2021). "LaMDA: our breakthrough conversation technology". Google.
- ^ Romal Thoppilan; et al. (2021). "LaMDA: Language Models for Dialog Applications". arXiv:2201.08239.
- ^ Srinivasan Iyer; et al. (2022). "OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization". arXiv:2212.12017.
- ^ Yuntao Bai; et al. "Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback". arXiv:2204.05862.
- ^ "THUDM/ChatGLM-6B". GitHub.
- ^ "Introducing ChatGPT". OpenAI. November 30, 2022. Retrieved March 15, 2023.