Sørensen similarity index
The Sørensen index, also known as Sørensen’s similarity coefficient, is a statistic used for comparing the similarity of two samples. It was developed by the botanist Thorvald Sørensen and published in 1948[1].
The shorthand version of the formula, as applied to qualitative data, is
where A and B are the species numbers in sample A and B, respectively, and C is the number of species shared by the two samples. This expression is easily extended to abundance instead of incidence of species. This quantitative version of the Sørensen index is also known as Czekanowski index. Multiplying by 2, we get Dice's coefficient which is always in [0,1] range. Sørensen index used as a distance measure, 1 - QS, is identical to Hellinger distance and Bray-Curtis distance.
The Sørensen coefficient is mainly useful for ecological community data (e.g. Looman & Campbell, 1960[2]). Justification for its use is primarily empirical rather than theoretical (although it can be justified theoretically as the intersection of two fuzzy sets[3]. As compared to Euclidean distance, Sørensen distance retains sensitivity in more heterogeneous data sets and gives less weight to outliers [4].
See also
- Jaccard index
- Kulczyński similarity index
- Renkonen similarity index
- Czekanowski similarity index
- Hamming distance
- Correlation
- Dice's coefficient
References
- ^ Sørensen, T. (1948) A method of establishing groups of equal amplitude in plant sociology based on similarity of species and its application to analyses of the vegetation on Danish commons. Biologiske Skrifter / Kongelige Danske Videnskabernes Selskab, 5 (4): 1-34.
- ^ Looman, J. and Campbell, J.B. (1960) Adaptation of Sorensen's K (1948) for estimating unit affinities in prairie vegetation. Ecology 41 (3): 409-416.
- ^ Roberts, D.W. (1986) Ordination on the basis of fuzzy set theory. Vegetatio 66 (3): 123-131.
- ^ McCune, Bruce & Grace, James (2002) Analysis of Ecological Communities. Mjm Software Design; ISBN 0972129006.