Word embedding

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Qwertyus (talk | contribs) at 12:35, 22 June 2015. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing where words from the vocabulary (and possibly phrases thereof) are mapped to vectors of real numbers in a low dimensional space, relative to the vocabulary size ("continuous space").

There are several methods for generating this mapping. They include neural networks,[1] dimensionality reduction on the word co-occurrence matrix,[2] and explicit representation in terms of the context in which words appear.[3]

Word and phrase embeddings, when used as the underlying input representation, have been shown to boost the performance in NLP tasks such as syntactic parsing[4] and sentiment analysis.[5]

See also

References

  1. ^ Mikolov, Tomas; Sutskever, Ilya; Chen, Kai; Corrado, Greg; Dean, Jeffrey (2013). "Distributed Representations of Words and Phrases and their Compositionality". arXiv:1310.4546 [cs.CL]. {{cite arXiv}}: line feed character in |title= at position 59 (help)
  2. ^ Lebret, Rémi; Collobert, Ronan (2013). "Word Emdeddings through Hellinger PCA". arXiv:1312.5542 [cs.CL].
  3. ^ Levy, Omer; Goldberg, Yoav. "Linguistic Regularities in Sparse and Explicit Word Representations" (PDF). Proceedings of the Eighteenth Conference on Computational Natural Language Learning, Baltimore, Maryland, USA, June. Association for Computational Linguistics. 2014.
  4. ^ Socher, Richard; Bauer, John; Manning, Christopher; Ng, Andrew. "Parsing with compositional vector grammars" (PDF). Proceedings of the ACL conference. 2013.
  5. ^ Socher, Richard; Perelygin, Alex; Wu, Jean; Chuang, Jason; Manning, Chris; Ng, Andrew; Potts, Chris. "Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank" (PDF). Conference on Empirical Methods in Natural Language Processing.