Stop word: Difference between revisions
I don't believe a citation is needed in this case; it is a priori. |
|||
Line 1: | Line 1: | ||
'''Stop words''', or '''stopwords''', is the name given to words which are filtered out prior to, or after, processing of natural language data (text). [[Hans Peter Luhn]], one of the pioneers in [[information retrieval]], is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above. |
'''Stop words''', or '''stopwords''', is the name given to words which are filtered out prior to, or after, processing of natural language data (text). [[Hans Peter Luhn]], one of the pioneers in [[information retrieval]], is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above. |
||
There is no definite list of stop words which all natural language processing tools incorporate. |
There is no definite list of stop words which all natural language processing tools incorporate. Not all NLP tools use a stoplist. Some tools specifically avoid using them to support [[phrase searching]]. The use of a [[stemming]] algorithm may reduce part of the rationale or dependence on a stoplist to filter out words.{{Fact|date=February 2007}} |
||
== See also == |
== See also == |
Revision as of 19:31, 8 November 2007
Stop words, or stopwords, is the name given to words which are filtered out prior to, or after, processing of natural language data (text). Hans Peter Luhn, one of the pioneers in information retrieval, is credited with coining the phrase and using the concept in his design. It is controlled by human input and not automated. This is sometimes seen as a negative approach to the natural articles of speech as mentioned above.
There is no definite list of stop words which all natural language processing tools incorporate. Not all NLP tools use a stoplist. Some tools specifically avoid using them to support phrase searching. The use of a stemming algorithm may reduce part of the rationale or dependence on a stoplist to filter out words.[citation needed]
See also
- Text mining
- Concept mining
- Information extraction
- Natural language processing
- Query expansion
- Stemming
- Search engine indexing
.......
External links
- A List of English Stop Words (about 3 kilobytes).
- The snowball project currently provides lists of stopwords for English, French, Spanish, German, Portuguese, Italian, Dutch, Swedish, Norwegian, Danish, Russian, Finnish and Hungarian as part of a software stemmer project. These lists are used in other software such as the Perl Lingua::StopWords module.