Karen Spärck Jones

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Karen Sparck Jones is a computer science researcher and innovator who pioneered the search engine algorithm known as inverse document frequency (IDF). While many early information scientists and computer engineers were focused on developing programming languages and coding computer systems, Sparck-Jones thought it more beneficial to develop information retrieval systems that could understand human language.[1]


Karen Sparck-Jones was born in Huddersfield, Yorkshire, England in 1935 and attended school through university at Girton College in Cambridge. While she did not study computer science in school, she began her research career in a niche organization known as the Cambridge Language Research Unit (CLRU). Through her work at the CLRU, Sparck-Jones began pursuing her Ph.D. At the time of submission, her Ph.D thesis was cast aside as uninspired and lacking original thought but was later published in its entirety as a book.[2]

Professional Career

           After completing her Ph.D, Sparck-Jones continued to research language computation techniques. Using unofficial connections she made through her marriage to Roger Needham in 1958, she was able to continue her pursuit of using refined term clustering in language and information retrieval.[2] This study combined with the use of some of her husband’s authored works afforded Sparck-Jones the ability to come up with her method of inverse document frequency.

           Very soon after publishing her first paper on IDF in 1972, the practice of using IDF in laboratory research was gaining traction and laboratories such as the Vector Space Laboratory at Cornell had already positioned it as a primary in their procedures.[2]

Innovative Legacy

           Karen Sparck-Jones’ discoveries helped pave the way for modern-day information retrieval that allows search engines to quickly identify the most relevant results and curate millions of responses to internet queries. The problem that she solved was one that seemed at the time to be specific to one vein of academic research in Information Retrieval, then a sect of Computer Information Systems (CIS).[3] It is clear now that Sparck-Jones had a lasting impact on the general public as well, however, with the creation and wide-spread use of the internet several years after her paper was published.[4]

           While the direct implementation of IDF is no longer the driving force behind information retrieval in favor of more sophisticated and complex designs that have evolved in the decades since Sparck-Jones’ research was at large, her original papers are among the most cited papers in the field of CIS. In this sense, the innovation made possible by Sparck-Jones is not a direct invention, but a foundation that was laid through years of study and work in a time when her efforts were overlooked because her station as a woman was not respected.

           The impact that Karen Sparck-Jones left on the world lies in the current reliance on the internet and the World Wide Web for all information needs. In the digital age, it is expected that questions should be answered through a simple search and retrieval process where results are curated to ensure the first listed answer solves the user’s problem. The quantity of data will continue to grow, but in this era, its utilization and analysis are of utmost importance.


[1] (Bowles, 2019)

[2] (Robertson & Tait, 2008)

[3] (Tait, 2007)

[3] (A Brief History, n.d)


Bowles, N. (2019, January 2). Overlooked no more: Karen Sparck Jones, who established the basis for search engines. The New York Times. Retrieved November 28, 2022, from https://www.nytimes.com/2019/01/02/obituaries/karen-sparck-jones-overlooked.html

Robertson, S., & Tait, J. (2008). Karen Spärck Jones. Journal of the American Society for Information Science & Technology, 59(5), 852–854. https://doi-org.ezproxy.neu.edu/10.1002/asi.20784

Tait, J. I. (2007). Karen Spärck Jones. Computational Linguistics, 33(3), 289–291. https://doi-org.ezproxy.neu.edu/10.1162/coli.2007.33.3.289

University System of Georgia. (n.d.). A Brief History of the Internet. Online Library Learning Center. Retrieved November 28, 2022, from https://www.usg.edu/galileo/skills/about_ollc_site.phtml

Karen Spärck Jones

Karen Spärck.jpg
Karen Spärck Jones in 2002
Born(1935-08-26)26 August 1935
Huddersfield, Yorkshire, England
Died4 April 2007(2007-04-04) (aged 71)[1]
Alma materUniversity of Cambridge
Known forTerm frequency–inverse document frequency
(m. 1958)
Scientific career
FieldsComputer science
Information retrieval
Natural language processing
Document retrieval
InstitutionsUniversity of Cambridge
ThesisSynonymy and Semantic Classification (1964)
Doctoral advisorRichard Braithwaite[1]

Karen Spärck Jones FBA (26 August 1935 – 4 April 2007) was a pioneering British computer scientist responsible for the concept of inverse document frequency (IDF), a technology that underlies most modern search engines.[2][3][4][5] In 2019, The New York Times published her belated obituary in its series Overlooked,[6][7] calling her "a pioneer of computer science for work combining statistics and linguistics, and an advocate for women in the field."[8] From 2008, to recognize her achievements in the fields of information retrieval[9][10] (IR) and natural language processing (NLP), the Karen Spärck Jones Award is awarded to a new recipient with outstanding research in one or both of her fields.[11][12][13][14]

Early life and education[edit]

Karen Ida Boalth Spärck Jones was born in Huddersfield, Yorkshire, England. Spärck Jones was educated at a grammar school in Huddersfield and then from 1953 to 1956 at Girton College, Cambridge, studying history, with an additional final year in Moral Sciences (philosophy). She briefly became a school teacher,[citation needed] before moving into computer science.[15]


Spärck Jones worked at the Cambridge Language Research Unit from the late 1950s,[16] then at Cambridge University Computer Laboratory from 1974 until her retirement in 2002. From 1999 she held the post of Professor of Computers and Information.[1] Prior to 1999 she was employed on a series of short-term contracts.[8] She continued to work in the Computer Laboratory until shortly before her death. Her publications include nine books and numerous papers. A full list of her publications is available from the Cambridge Computer Laboratory.[17]

Her main research interests, since the late 1950s, were natural language processing and information retrieval. One of her most important contributions was the concept of inverse document frequency (IDF) weighting in information retrieval, which she introduced in a 1972 paper.[9][18] IDF is used in most search engines today, usually as part of the term frequency–inverse document frequency (TF–IDF) weighting scheme.[19] In 1982 she became involved in the Alvey Programme.[8]

Honours and awards[edit]

An annual Karen Spärck Jones Award and lecture is named in her honour.[20] In August 2017, the University of Huddersfield renamed one of its campus buildings in her honour. Formerly known as Canalside West, the Spärck Jones building houses the University's School of Computing and Engineering.[21] Other honours and awards include

Personal life[edit]

Spärck Jones was married to fellow Cambridge computer scientist Roger Needham in 1958.[16]


  1. ^ a b c "Jones, Karen Ida Boalth Spärck (1935–2007), Computer Scientist". Oxford Dictionary of National Biography (online ed.). Oxford University Press. doi:10.1093/ref:odnb/98729. (Subscription or UK public library membership required.)
  2. ^ Video: Natural Language and the Information Layer, Karen Spärck Jones, March 2007
  3. ^ University of Cambridge obituary
  4. ^ Obituary, The Independent, 12 April 2007
  5. ^ Robertson, S.; Tait, J. (2008). "Karen Spärck Jones". Journal of the American Society for Information Science and Technology. 59 (5): 852. doi:10.1002/asi.20784.
  6. ^ Padnani, Amisha; Bennett, Jessica (8 March 2018). "Remarkable People We Overlooked in Our Obituaries". The New York Times. ISSN 0362-4331. Retrieved 7 December 2019.
  7. ^ "Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines". The New York Times. 2 January 2019. ISSN 0362-4331. Retrieved 7 December 2019.
  8. ^ a b c "Overlooked No More: Karen Sparck Jones, Who Established the Basis for Search Engines". The New York Times. 2 January 2019. ISSN 0362-4331. Retrieved 3 January 2019.
  9. ^ a b Spärck Jones, K. (1972). "A Statistical Interpretation of Term Specificity and Its Application in Retrieval". Journal of Documentation. 28: 11–21. CiteSeerX doi:10.1108/eb026526. S2CID 2996187.
  10. ^ Tait, John I., ed. (2005). Charting a New Course: Natural Language Processing and Information Retrieval, Essays in Honour of Karen Spärck Jones. The Kluwer International Series on Information Retrieval. Vol. 16. doi:10.1007/1-4020-3467-9. ISBN 978-1-4020-3343-8.
  11. ^ Obituary, The Times, 22 June 2007 (subscription required)
  12. ^ Computer Science, A Woman's Work, IEEE Spectrum, May 2007
  13. ^ Thompson, Bill. "Karen Spärck Jones". A Stick a Dog and a Box With Something In It. Retrieved 1 August 2019. (originally published in The Times)
  14. ^ a b c Tait, J. I. (2007). "Karen Spärck Jones". Computational Linguistics. 33 (3): 289–291. doi:10.1162/coli.2007.33.3.289. S2CID 19790552.
  15. ^ Karen Spärck Jones (1986). Synonymy and Semantic Classification (thesis published as a book). Edinburgh Information Technology series. Vol. 1. Edinburgh University Press. ISBN 9780852245170.
  16. ^ a b Anon (2007). "Karen Spärck Jones, FBA Professor Emerita of Computers and Information Honorary Fellow of Wolfson College 26 August 1935 – 4 April 2007". cam.ac.uk. University of Cambridge.
  17. ^ "Karen Sparck Jones Publications".
  18. ^ Spärck Jones, K. (1973). "Index term weighting". Information Storage and Retrieval. 9 (11): 619–633. doi:10.1016/0020-0271(73)90043-0.
  19. ^ Maybury, M. T. (2005). "Karen Spärck Jones and Summarization". Charting a New Course: Natural Language Processing and Information Retrieval. The Kluwer International Series on Information Retrieval. Vol. 16. pp. 99–10. doi:10.1007/1-4020-3467-9_7. ISBN 978-1-4020-3343-8.
  20. ^ "Karen Spärck Jones lecture". BCS Academy of Computing. British Computer Society. Retrieved 3 October 2013.
  21. ^ "How to find us – University of Huddersfield". hud.ac.uk. Retrieved 20 September 2017.
  22. ^ a b c "Karen Spärck Jones". The Daily Telegraph. 12 April 2007.
  23. ^ Anon (2022). "Elected AAAI Fellows". aaai.org.
  24. ^ a b c "Karen Spärck Jones". The Computer Laboratory, Cambridge University. March 2007. Retrieved 2 April 2018.
  25. ^ "Gerard Salton Awards". Special Interest Group on Information Retrieval. Retrieved 2 April 2018.
  26. ^ "ACL Lifetime Achievement Award Recipients". ACL wiki. ACL. Retrieved 16 August 2014.

Awards and achievements
Preceded by ACL Lifetime Achievement Award
Succeeded by