Co-citation, like Bibliographic Coupling, is a semantic similarity measure for documents that makes use of citation relationships. Co-citation is defined as the frequency with which two documents are cited together by other documents. If at least one other document cites two documents in common these documents are said to be co-cited. The more co-citations two documents receive, the higher their co-citation strength, and the more likely they are semantically related.
The figure to the right illustrates the concept of co-citation and a more recent variation of co-citation which accounts for the placement of citations in the full text of documents. The figure's left image shows the Documents A and B, which are both cited by Documents C, D and E; thus Documents A and B have a co-citation strength, or co-citation index of three. This score is usually established using citation indexes. Documents featuring high numbers of co-citations are regarded as more similar.
The figure's right image shows a citing document which cites the Documents 1, 2 and 3. Both the Documents 1 and 2 and the Documents 2 and 3 have a co-citation strength of one, given that they are cited together by exactly one other document. However, Documents 2 and 3 are cited in much closer proximity to each other in the citing document compared to Document 1. To make co-citation a more meaningful measure in this case, a Co-Citation Proximity Index (CPI) can be introduced to account for the placement of citations relative to each other. Documents co-cited at greater relative distances in the full text receive lower CPI values. Gipp and Beel were the first to propose using modified co-citation weights based on proximity.
Henry Small and Irina Marshakova are credited for introducing co-citation analysis in 1973. Both researchers came up with the measure independently, although Marshakova gained less credit, likely because her work was published in Russian.
Co-citation analysis provides a forward-looking assessment on document similarity in contrast to Bibliographic Coupling, which is retrospective. The citations a paper receives in the future depend on the evolution of an academic field, thus co-citation frequencies can still change. In the adjacent diagram, for example, Doc A and Doc B may still be co-cited by future documents, say Doc F and Doc G. This characteristic of co-citation allows for a dynamic document classification system when compared to Bibliographic Coupling.
Over the decades, researchers proposed variants or enhancements to the original co-citation concept. Howard White introduced author co-citation analysis in 1981. Gipp and Beel proposed Co-citation Proximity Analysis (CPA) and introduced the CPI as an enhancement to the original co-citation concept in 2009. Co-citation Proximity Analysis considers the proximity of citations within the full-texts for similarity computation and therefore allows for a more fine-grained assessment of semantic document similarity than pure co-citation.
The motivations of authors for citing literature can vary greatly and occur for a variety of reasons aside from simply referring to academically relevant documents. Cole and Cole expressed this concern based on the observation that scientists tend to cite friends and research colleges more frequently, a partiality known as cronyism. Additionally, it has been observed that academic works which have already gained much credit and reputation in a field tend to receive even more credit and thus citations in future literature, an observation termed the Matthew effect in science.
- Bibliographic coupling
- Co-citation Proximity Analysis
- CITREC, an evaluation framework for citation-based similarity measures including Bibliographic coupling, Co-citation, Co-citation Proximity Analysis and others.
- Henry Small, 1973 "Co-citation in the scientific literature: a new measure of the relationship between two documents" Archived 2012-12-02 at the Wayback Machine. Journal of the American Society for Information Science, 24:265–269.
- Jeppe Nicolaisen, Co-citation Archived 2013-03-15 at the Wayback Machine, in Birger Hjørland, ed., Core Concepts in Library and Information Science Archived 2010-05-25 at the Wayback Machine
- Henry Small, 1973. "Co-citation in the scientific literature: A new measure of the relationship between two documents" Archived 2012-12-02 at the Wayback Machine. Journal of the American Society for Information Science (JASIS), volume 24(4), pp. 265-269. doi = 10.1002/asi.4630240406
- Bela Gipp and Joeran Beel, 2009 "Citation Proximity Analysis (CPA) – A new approach for identifying related work based on Co-Citation Analysis" in Birger Larsen and Jacqueline Leta, editors, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), volume 2, pages 571–575, Rio de Janeiro (Brazil), July 2009.
- Kevin W. Boyack, Henry Small and Richard Klavans, 2013 "Improving the Accuracy of Co-citation Clustering Using Full Text" Archived 2016-03-04 at the Wayback Machine Journal of the American Society for Information Science and Technology, Volume 64, Issue 9, pages 1759–1767, September 2013
- Irena Marshakova Shaikevich, 1973. "System of Document Connections Based on References". Scientific and Technical Information Serial of VINITI, 6(2):3–8
- Jeppe Nicolaisen, 2005 "Co-citation" Archived 2013-03-15 at the Wayback Machine, from The Royal School of Library and Information Science (RSLIS), Copenhagen, Denmark. Retrieved December, 28. 2012.
- Frank Havemann, 2009. "Einführung in die Bibliometrie." Humboldt University of Berlin.
- Garfield, E., November 27, 2001. "From Bibliographic Coupling to Co-Citation Analysis Via Algorithmic Historio-Bibliography: A Citationist’s Tribute to Belver C. Griffith. a paper presented at the Drexel University, Philadelphia, PA.
- Howard D. White and Belver C. Griffith, 1981. "Author Cocitation: A Literature Measure of Intellectual Structure." Journal of the American Society for Information Science (JASIS), May, 1981 volume 32(3), pp. 163-171. -- the first ACA paper. DOI = 10.1002/asi.4630320302.
- M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp, "Evaluating Link-based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200.
- Cole, J. R. & Cole, S., 1973. "Social Stratification in Science". Chicago, IL: University of Chicago Press.
- Bela Gipp, Norman Meuschke & Mario Lipinski, 2015. "CITREC: An Evaluation Framework for Citation-Based Similarity Measures based on TREC Genomics and PubMed Central" in Proceedings of the iConference 2015, Newport Beach, California, 2015.