Jump to content

Gensim

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by Thomwiggers (talk | contribs) at 07:17, 21 September 2016 (Update Github link). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Gensim
Original author(s)Radim Řehůřek
Developer(s)various
Stable release
0.13.2 / 26 August 2016; 8 years ago (2016-08-26)
Repository
Written inPython
Platformcross-platform
TypeNatural language processing
LicenseLGPL
Websiteradimrehurek.com/gensim/

Gensim is an open-source vector space modeling and topic modeling toolkit, implemented in the Python programming language. It uses NumPy, SciPy and optionally Cython for performance. It is specifically intended for handling large text collections, using efficient online, incremental algorithms. Gensim is commercially supported by the startup RaRe Technologies.[1]

Gensim includes implementations of tf-idf, random projections, word2vec and document2vec algorithms,[2] hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.[3]

Gensim has been used and cited in over 300 commercial as well as academic applications[4][unreliable source?] and described in several news articles and interviews.[5][6] The code is hosted on GitHub[7] and a support forum is maintained on Google Groups.[8]

Some of the online algorithms in Gensim were also published in the 2011 PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek, the creator of Gensim.[9]

References

  1. ^ RaRe Technologies official site
  2. ^ Deep learning with word2vec and gensim
  3. ^ Radim Řehůřek and Petr Sojka (2010). Software framework for topic modelling with large corpora. Proc. LREC Workshop on New Challenges for NLP Frameworks.
  4. ^ academic citations of Gensim
  5. ^ Interview with Radim Řehůřek, creator of Gensim
  6. ^ http://decisionstats.com/2015/12/07/decisionstats-interview-radim-rehurek-gensim-python/
  7. ^ gensim source code on Github
  8. ^ gensim mailing list on Google Groups
  9. ^ Rehurek, Radim (2011). "Scalability of Semantic Analysis in Natural Language Processing" (PDF). Retrieved 27 January 2015. my open-source gensim software package that accompanies this thesis