From Wikipedia, the free encyclopedia
Jump to: navigation, search
Scikit-learn logo.png
Original author(s) David Cournapeau
Initial release June 2007; 7 years ago (2007-06)
Stable release 0.15.2 / September 4, 2014; 7 months ago (2014-09-04)[1]
Written in Python, Cython, C and C++
Operating system Linux, Mac OS X, Microsoft Windows
Type Library for machine learning
License BSD License

scikit-learn (formerly scikits.learn) is an open source machine learning library for the Python programming language.[2] It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.


The scikit-learn project started as scikits.learn, a Google Summer of Code project by David Cournapeau. Its name stems from the notion that it is a "SciKit" (SciPy Toolkit), a separately-developed and distributed third-party extension to SciPy.[3] The original codebase was later extensively rewritten by other developers. Of the various scikits, scikit-learn as well as scikit-image were described as "well-maintained and popular" in November 2012.[4]

As of 2015, scikit-learn is under active development and is sponsored by INRIA, Telecom ParisTech and occasionally Google (through the Google Summer of Code).[5] Among its users are Evernote, which uses the library to distinguish recipes from other user posts through a naive Bayes classifier,[6] and Mendeley, which builds recommender systems from scikit-learn's SGD regression algorithm.[7]

The scikit-learn API has been adopted by, who offer a proprietary implementation of random forests called wiseRF.[8][9]'s business partner Continuum IO claimed data throughput of up to 7.5 times that of scikit-learn's implementation;[10] since then, the scikit-learn developers claim to have optimized their implementation to be competitive with's, except in terms of memory use.[11]


scikit-learn is largely written in Python, with some core algorithms written in Cython to achieve performance. Support vector machines are implemented by a Cython wrapper around LIBSVM; logistic regression and linear support vector machines by a similar wrapper around LIBLINEAR.

See also[edit]


  1. ^ Andreas Müller. "scikit-learn 0.15.2". Python Package Index. 
  2. ^ Fabian Pedregosa; Gaël Varoquaux; Alexandre Gramfort; Vincent Michel; Bertrand Thirion; Olivier Grisel; Mathieu Blondel; Peter Prettenhofer; Ron Weiss; Vincent Dubourg; Jake Vanderplas; Alexandre Passos; David Cournapeau (2011). "Scikit-learn: Machine Learning in Python". Journal of Machine Learning Research 12: 2825–2830. 
  3. ^ Dreijer, Janto. "scikit-learn". 
  4. ^ Eli Bressert (2012). SciPy and NumPy: an overview for developers. O'Reilly. p. 43. 
  5. ^ "About Us". Retrieved 23 March 2015. 
  6. ^ Mark Ayzenshtat (22 January 2013). "Stay classified". Evernote Techblog. Retrieved 4 May 2013. 
  7. ^ Mark Levy (2013). Efficient Top-N Recommendation by Linear Regression. ACM RecSys Large Scale Recommender System workshop. 
  8. ^ "wiserf". Retrieved 22 January 2014. 
  9. ^ Buitinck, Lars, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae et al. (2013). API design for machine learning software: experiences from the scikit-learn project. ECML PKDD Workshop on Languages for Machine Learning. 
  10. ^ Joseph W. Richards (27 November 2012). "wiseRF Use Cases and Benchmarks". Continuum IO. Retrieved 22 January 2014. 
  11. ^ Gaël Varoquaux (8 August 2013). "Scikit-learn 0.14 release: features and benchmarks". Retrieved 22 January 2014. 

External links[edit]