BERT (language model)

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Bidirectional Encoder Representations from Transformers (BERT) is a technique for NLP (Natural Language Processing) pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google[1][2]. Google is leveraging BERT to better understand user searches.[3]


BERT has its origins from pre-training contextual representations including Semi-supervised Sequence Learning[4], Generative Pre-Training, ELMo[5], and ULMFit[6]. However, unlike these previous models, BERT is the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus.

Context-free models such as word2vec or GloVe generate a single word embedding representation for each word in the vocabulary. However, BERT are deeply bidirectional.

See also[edit]


  1. ^ Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding". arXiv:1810.04805v2.
  2. ^ "Open Sourcing BERT: State-of-the-Art Pre-training for Natural Language Processing". Google AI Blog. Retrieved 2019-11-27.
  3. ^ "Understanding searches better than ever before". Google. 2019-10-25. Retrieved 2019-11-27.
  4. ^ Dai, Andrew; Le, Quoc (4 November 2015). "Semi-supervised Sequence Learning". arXiv:1511.01432.
  5. ^ Peters, Matthew; Neumann, Mark; Iyyer, Mohit; Gardner, Matt; Clark, Christopher; Lee, Kenton; Luke, Zettlemoyer (15 February 2018). "Deep contextualized word representations". arXiv:1802.05365v2.
  6. ^ Howard, Jeremy; Ruder, Sebastian (18 January 2018). "Universal Language Model Fine-tuning for Text Classification". arXiv:1801.06146v5.