Tversky index

The Tversky index, named after Amos Tversky,[1] is an asymmetric similarity measure on sets that compares a variant to a prototype. The Tversky index can be seen as a generalization of the Sørensen–Dice coefficient and the Jaccard index.

For sets X and Y the Tversky index is a number between 0 and 1 given by

${\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\alpha |X\setminus Y|+\beta |Y\setminus X|}}}$

Here, ${\displaystyle X\setminus Y}$ denotes the relative complement of Y in X.

Further, ${\displaystyle \alpha ,\beta \geq 0}$ are parameters of the Tversky index. Setting ${\displaystyle \alpha =\beta =1}$ produces the Jaccard index; setting ${\displaystyle \alpha =\beta =0.5}$ produces the Sørensen–Dice coefficient.

If we consider X to be the prototype and Y to be the variant, then ${\displaystyle \alpha }$ corresponds to the weight of the prototype and ${\displaystyle \beta }$ corresponds to the weight of the variant. Tversky measures with ${\displaystyle \alpha +\beta =1}$ are of special interest.[2]

Because of the inherent asymmetry, the Tversky index does not meet the criteria for a similarity metric. However, if symmetry is needed a variant of the original formulation has been proposed using max and min functions[3] .

${\displaystyle S(X,Y)={\frac {|X\cap Y|}{|X\cap Y|+\beta \left(\alpha a+(1-\alpha )b\right)}}}$

${\displaystyle a=\min \left(|X\setminus Y|,|Y\setminus X|\right)}$,

${\displaystyle b=\max \left(|X\setminus Y|,|Y\setminus X|\right)}$,

This formulation also re-arranges parameters ${\displaystyle \alpha }$ and ${\displaystyle \beta }$. Thus, ${\displaystyle \alpha }$ controls the balance between ${\displaystyle |X\setminus Y|}$ and ${\displaystyle |Y\setminus X|}$ in the denominator. Similarly, ${\displaystyle \beta }$ controls the effect of the symmetric difference ${\displaystyle |X\,\triangle \,Y\,|}$ versus ${\displaystyle |X\cap Y|}$ in the denominator.

Notes

1. ^ Tversky, Amos (1977). "Features of Similarity" (PDF). Psychological Review. 84 (4): 327–352. doi:10.1037/0033-295x.84.4.327.
2. ^
3. ^ Jimenez, S., Becerra, C., Gelbukh, A. SOFTCARDINALITY-CORE: Improving Text Overlap with Distributional Measures for Semantic Textual Similarity. Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, p.194-201, June 7–8, 2013, Atlanta, Georgia, USA.