Jump to content

Stop word: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
removed spammy link
Line 19: Line 19:


== External links ==
== External links ==
* [http://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html/ Full-Text Stopwords in MySQL ]
* [http://armandbrahaj.blog.al/2009/04/14/list-of-english-stop-words/ List of English Stop Words (PHP array, CSV) ]
* [http://armandbrahaj.blog.al/2009/04/14/list-of-english-stop-words/ List of English Stop Words (PHP array, CSV) ]
* [http://www.textfixer.com/resources/common-english-words.txt English Stop Words (CSV)]
* [http://www.textfixer.com/resources/common-english-words.txt English Stop Words (CSV)]

Revision as of 03:28, 17 March 2012

In computing, stop words are words which are filtered out prior to, or after, processing of natural language data (text). It is controlled by human input and not automated. There is not one definite list of stop words which all tools use, if even used. Some tools specifically avoid using them to support phrase search.

Any group of words can be chosen as the stop words for a given purpose. For some search machines, these are some of the most common, short function words, such as the, is, at, which and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as 'The Who', 'The The', or 'Take That'. Other search engines remove some of the most common words—including lexical words, such as "want"—from query in order to improve performance.[1]

Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept in his design.

See also

References

  1. ^ Stackoverflow: "One of our major performance optimizations for the “related questions” query is removing the top 10,000 most common English dictionary words (as determined by Google search) before submitting the query to the SQL Server 2008 full text engine. It’s shocking how little is left of most posts once you remove the top 10k English dictionary words. This helps limit and narrow the returned results, which makes the query dramatically faster."