Sphinx (search engine)
From Wikipedia, the free encyclopedia
![]() |
|
| Developer(s) | Andrew Aksyonoff |
|---|---|
| Initial release | 2001 |
| Stable release | 2.0.3-release / December 2011 |
| Written in | C++ |
| Operating system | Linux, Windows, Solaris, FreeBSD, NetBSD, Mac OS, AIX |
| Type | Search engine |
| License | GPLv2 or proprietary[1] |
| Website | http://sphinxsearch.com/ |
Sphinx is a free software search engine designed with indexing database content in mind. It currently supports MySQL, PostgreSQL, and ODBC-compliant databases as data sources natively. Other data sources can be indexed via pipe in a custom XML format. It is distributed under the terms of the GNU General Public License version two or a proprietary license.[1]
Starting from version 0.9.9, querying is possible using SphinxQL, a subset of SQL. Starting from version 1.10-beta, both incremental (via Real-Time backend[2]) and batch indexing is supported.
Sphinx is implemented by more than 100 web sites and services, including Craigslist.org.[3]
Contents |
[edit] Features
- Batch and incremental (soft real-time) full-text indexing.
- Support for non-text attributes (scalars, strings, sets).
- Direct indexing of SQL databases. Native support for MySQL, PostgreSQL, MSSQL, plus ODBC connectivity.
- XML documents indexing support
- Distributed searching support out of the box.
- Integration via access APIs
- SQL-like syntax support via MySQL protocol (since 0.9.9)
- Full-text searching syntax.
- Database-like result set processing.
- Relevance ranking utilizing additional factors besides standard BM25.
- Text processing support for SBCS and UTF-8 encodings, stopwords, indexing of words known not to appear in the database ("hitless"), stemming, word forms, tokenizing exceptions, and "blended characters" (dual-indexing as both a real character and a word separator).
- supports UDF (since 2.0.1)
[edit] Performance and scalability
- Indexing speed of up to 10-15 MB/sec per core and HDD.
- Searching speed of up to 200-300 queries/sec against 1,000,000-document, 1.2 GB collection.
- Biggest known production instances indexes 8.1 billion documents,[4] busiest known one (craigslist) serves over 50,000,000 queries/day
[edit] See also
[edit] References
- ^ a b "Commercial licensing". http://sphinxsearch.com/licensing.html. Retrieved 2010-02-24.
- ^ "RT-index manual". Sphinx Technologies Inc.. http://sphinxsearch.com/docs/current.html#rt-indexes. Retrieved 29 October 2011.
- ^ "Powered by Sphinx". Sphinx Technologies Inc.. http://sphinxsearch.com/info/powered/. Retrieved 1 April 2011.
- ^ http://www.infegy.com
[edit] External links
| Wikibooks has a book on the topic of |
[edit] Further reading
- Aksyonoff, Andrew (2011). Introduction to Search with Sphinx: From installation to relevance tuning. O'Reilly Media. ISBN 978-0-596-80955-3. http://shop.oreilly.com/product/9780596809539.do.
- Ali, Abbas (2011). Sphinx Search Beginner's Guide. Birmingham, England: Packt Publishing. ISBN 978-1-84951-254-1. http://www.packtpub.com/sphinx-search-beginners-guide/book.
