Sphinx (search engine)

From Wikipedia, the free encyclopedia
Jump to: navigation, search
Sphinx
Sphinx search logo.jpg
Developer(s) Andrew Aksyonoff
Initial release 2001
Stable release 2.0.3-release / December 2011; 2 months ago (2011-12)
Written in C++
Operating system Linux, Windows, Solaris, FreeBSD, NetBSD, Mac OS, AIX
Type Search engine
License GPLv2 or proprietary[1]
Website http://sphinxsearch.com/

Sphinx is a free software search engine designed with indexing database content in mind. It currently supports MySQL, PostgreSQL, and ODBC-compliant databases as data sources natively. Other data sources can be indexed via pipe in a custom XML format. It is distributed under the terms of the GNU General Public License version two or a proprietary license.[1]

Starting from version 0.9.9, querying is possible using SphinxQL, a subset of SQL. Starting from version 1.10-beta, both incremental (via Real-Time backend[2]) and batch indexing is supported.

Sphinx is implemented by more than 100 web sites and services, including Craigslist.org.[3]

Contents

[edit] Features

  • Batch and incremental (soft real-time) full-text indexing.
  • Support for non-text attributes (scalars, strings, sets).
  • Direct indexing of SQL databases. Native support for MySQL, PostgreSQL, MSSQL, plus ODBC connectivity.
  • XML documents indexing support
  • Distributed searching support out of the box.
  • Integration via access APIs
  • SQL-like syntax support via MySQL protocol (since 0.9.9)
  • Full-text searching syntax.
  • Database-like result set processing.
  • Relevance ranking utilizing additional factors besides standard BM25.
  • Text processing support for SBCS and UTF-8 encodings, stopwords, indexing of words known not to appear in the database ("hitless"), stemming, word forms, tokenizing exceptions, and "blended characters" (dual-indexing as both a real character and a word separator).
  • supports UDF (since 2.0.1)

[edit] Performance and scalability

  • Indexing speed of up to 10-15 MB/sec per core and HDD.
  • Searching speed of up to 200-300 queries/sec against 1,000,000-document, 1.2 GB collection.
  • Biggest known production instances indexes 8.1 billion documents,[4] busiest known one (craigslist) serves over 50,000,000 queries/day

[edit] See also

[edit] References

  1. ^ a b "Commercial licensing". http://sphinxsearch.com/licensing.html. Retrieved 2010-02-24. 
  2. ^ "RT-index manual". Sphinx Technologies Inc.. http://sphinxsearch.com/docs/current.html#rt-indexes. Retrieved 29 October 2011. 
  3. ^ "Powered by Sphinx". Sphinx Technologies Inc.. http://sphinxsearch.com/info/powered/. Retrieved 1 April 2011. 
  4. ^ http://www.infegy.com

[edit] External links

[edit] Further reading

Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export
Languages