From Wikipedia, the free encyclopedia
Jump to: navigation, search
Gensim logo.png
Original author(s) Radim Řehůřek
Developer(s) various
Stable release 0.10.0 / 4 June 2014; 3 months ago (2014-06-04)
Development status active
Written in Python
Platform cross-platform
Type Natural language processing
License LGPL

Gensim is an open-source vector space modeling and topic modeling toolkit, implemented in the Python programming language, using NumPy, SciPy and optionally Cython for performance. It is specifically intended for handling large text collections, using efficient online algorithms.

Gensim includes implementations of tf–idf, random projections, deep learning with Google's word2vec algorithm [1] (reimplemented and optimized in Cython), hierarchical Dirichlet processes (HDP), latent semantic analysis (LSA) and latent Dirichlet allocation (LDA), including distributed parallel versions.[2]

Gensim has been used in a number of commercial as well as academic applications.[3][4] The code is hosted on GitHub[5] and a support forum is maintained on Google Groups.[6]


External links[edit]