Statistical semantics is the study of "how the statistical patterns of human word usage can be used to figure out what people mean, at least to a level sufficient for information access". How can we figure out what words mean, simply by looking at patterns of words in huge collections of text? What are the limits to this approach to understanding words?
The term Statistical Semantics was first used by Warren Weaver in his well-known paper on machine translation. He argued that word sense disambiguation for machine translation should be based on the co-occurrence frequency of the context words near a given target word. The underlying assumption that "a word is characterized by the company it keeps" was advocated by J.R. Firth. This assumption is known in Linguistics as the Distributional Hypothesis. Emile Delavenay defined Statistical Semantics as "Statistical study of meanings of words and their frequency and order of recurrence." "Furnas et al. 1983" is frequently cited as a foundational contribution to Statistical Semantics. An early success in the field was Latent Semantic Analysis.
Applications of statistical semantics
Research in Statistical Semantics has resulted in a wide variety of algorithms that use the Distributional Hypothesis to discover many aspects of semantics, by applying statistical techniques to large corpora:
- Measuring the similarity in word meanings 
- Measuring the similarity in word relations 
- Modeling similarity-based generalization 
- Discovering words with a given relation 
- Classifying relations between words 
- Extracting keywords from documents 
- Measuring the cohesiveness of text 
- Discovering the different senses of words 
- Distinguishing the different senses of words 
- Subcognitive aspects of words 
- Distinguishing praise from criticism 
Statistical Semantics focuses on the meanings of common words and the relations between common words, unlike text mining, which tends to focus on whole documents, document collections, or named entities (names of people, places, and organizations). Statistical Semantics is a subfield of computational semantics, which is in turn a subfield of computational linguistics and natural language processing.
Many of the applications of Statistical Semantics (listed above) can also be addressed by lexicon-based algorithms, instead of the corpus-based algorithms of Statistical Semantics. One advantage of corpus-based algorithms is that they are typically not as labour-intensive as lexicon-based algorithms. Another advantage is that they are usually easier to adapt to new languages than lexicon-based algorithms. However, the best performance on an application is often achieved by combining the two approaches.
- Latent semantic analysis
- Latent semantic indexing
- Text mining
- Information retrieval
- Natural language processing
- Computational linguistics
- Web mining
- Semantic similarity
- Text corpus
- Semantic Analytics
- Weaver 1955
- Firth 1957
- Sahlgren 2008
- Delavenay 1960
- Furnas et al. 1983
- Lund, Burgess & Atchley 1995
- Landauer & Dumais 1997
- McDonald & Ramscar 2001
- Terra & Clarke 2003
- Turney 2006
- Yarlett 2008
- Hearst 1992
- Turney & Littman 2005
- Frank et al. 1999
- Turney 2000
- Turney 2003
- Pantel & Lin 2002
- Turney 2004
- Turney 2001
- Turney & Littman 2003
- Turney et al. 2003
- Delavenay, Emile (1960). An Introduction to Machine Translation. New York, NY: Thames and Hudson. OCLC 1001646.
- Firth, John R. (1957). "A synopsis of linguistic theory 1930-1955". Studies in Linguistic Analysis (Oxford: Philological Society): 1–32.
- Frank, Eibe; Paynter, Gordon W.; Witten, Ian H.; Gutwin, Carl; Nevill-Manning, Craig G. (1999). "Domain-specific keyphrase extraction". Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence. IJCAI-99 2. California: Morgan Kaufmann. pp. 668–673. ISBN 1-55860-613-0. CiteSeerX: 10.1.1.43.9100 CiteSeerX: 10.1.1.148.3598.
- Furnas, George W.; Landauer, T. K.; Gomez, L. M.; Dumais, S. T. (1983). "Statistical semantics: Analysis of the potential performance of keyword information systems". Bell System Technical Journal 62 (6): 1753–1806.
- Hearst, Marti A. (1992). "Automatic Acquisition of Hyponyms from Large Text Corpora". Proceedings of the Fourteenth International Conference on Computational Linguistics. COLING '92. Nantes, France. pp. 539–545. doi:10.3115/992133.992154. CiteSeerX: 10.1.1.36.701.
- Landauer, Thomas K.; Dumais, Susan T. (1997). "A solution to Plato's problem: The latent semantic analysis theory of the acquisition, induction, and representation of knowledge". Psychological Review 104 (2): 211–240. CiteSeerX: 10.1.1.184.4759.
- Lund, Kevin; Burgess, Curt; Atchley, Ruth Ann (1995). "Semantic and associative priming in high-dimensional semantic space". Proceedings of the 17th Annual Conference of the Cognitive Science Society. Cognitive Science Society. pp. 660–665.
- McDonald, Scott; Ramscar, Michael (2001). "Testing the distributional hypothesis: The influence of context on judgements of semantic similarity". Proceedings of the 23rd Annual Conference of the Cognitive Science Society. pp. 611–616. CiteSeerX: 10.1.1.104.7535.
- Pantel, Patrick; Lin, Dekang (2002). "Discovering word senses from text". Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining. KDD '02. pp. 613–619. doi:10.1145/775047.775138. ISBN 1-58113-567-X. CiteSeerX: 10.1.1.12.6771.
- Sahlgren, Magnus (2008). "The Distributional Hypothesis". Rivista di Linguistica 20 (1): 33–53.
- Terra, Egidio L.; Clarke, Charles L. A. (2003). "Frequency estimates for statistical word similarity measures". Proceedings of the Human Language Technology and North American Chapter of Association of Computational Linguistics Conference 2003. HLT/NAACL 2003. pp. 244–251. doi:10.3115/1073445.1073477. CiteSeerX: 10.1.1.12.9041.
- Turney, Peter D. (May 2000). "Learning algorithms for keyphrase extraction". Information Retrieval 2 (4): 303–336. arXiv:cs/0212020. doi:10.1023/A:1009976227802. CiteSeerX: 10.1.1.11.1829.
- Turney, Peter D. (2001). "Answering subcognitive Turing Test questions: A reply to French". Journal of Experimental and Theoretical Artificial Intelligence 13 (4): 409–419. arXiv:cs/0212015. CiteSeerX: 10.1.1.12.8734.
- Turney, Peter D. (2003). "Coherent keyphrase extraction via Web mining". Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence. IJCAI-03. Acapulco, Mexico. pp. 434–439. arXiv:cs/0308033. CiteSeerX: 10.1.1.100.3751.
- Turney, Peter D. (2004). "Word sense disambiguation by Web mining for word co-occurrence probabilities". Proceedings of the Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text. SENSEVAL-3. Barcelona, Spain. pp. 239–242. arXiv:cs/0407065.
- Turney, Peter D. (2006). "Similarity of semantic relations". Computational Linguistics 32 (3): 379–416. arXiv:cs/0608100. doi:10.1162/coli.2006.32.3.379. CiteSeerX: 10.1.1.75.8007.
- Turney, Peter D.; Littman, Michael L. (October 2003). "Measuring praise and criticism: Inference of semantic orientation from association". ACM Transactions on Information Systems (TOIS) 21 (4): 315–346. arXiv:cs/0309034. doi:10.1145/944012.944013. CiteSeerX: 10.1.1.9.6425.
- Turney, Peter D.; Littman, Michael L. (2005). "Corpus-based Learning of Analogies and Semantic Relations". Machine Learning 60 (1–3): 251–278. arXiv:cs/0508103. doi:10.1007/s10994-005-0913-1. CiteSeerX: 10.1.1.90.9819.
- Turney, Peter D.; Littman, Michael L.; Bigham, Jeffrey; Shnayder, Victor (2003). "Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems". Proceedings of the International Conference on Recent Advances in Natural Language Processing. RANLP-03. Borovets, Bulgaria. pp. 482–489. arXiv:cs/0309035. CiteSeerX: 10.1.1.5.2939.
- Weaver, Warren (1955). "Translation". In Locke, W.N.; Booth, D.A. Machine Translation of Languages. Cambridge, Massachusetts: MIT Press. pp. 15–23. ISBN 0-8371-8434-7.
- Yarlett, Daniel G. (2008). Language Learning Through Similarity-Based Generalization (PhD thesis). Stanford University.