Talk:Sørensen–Dice coefficient

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Wouldn't you count start ($) and end (^) as letters? Then there are 6 digrams in night and nacht, and they share 3 ($n, ht, and t^) - 50% coefficient. Homunq (talk) 00:43, 27 February 2009 (UTC)

Does it induce a proper metric?[edit]

Jaccard does. But is this

a metric? Can someone add this (or the opposite) to the article? bungalo (talk) 20:45, 31 January 2011 (UTC)

no, it's not. I'll add something to show why not

RichardThePict (talk) 15:05, 13 November 2011 (UTC)

Proposed Merge[edit]

This is identical to the Sørensen similarity index. I think the two articles should be merged, but I don't know what would be the best name for the merged article. The formula is sometimes called the Sørensen-Dice coefficent. Maghnus (talk) 19:50, 31 August 2011 (UTC)

Counterexample for triangle inequality is wrong[edit]

I think that the counterexample for the triangle inequality is wrong. dist({a},{b})=1, dist({a},{a,b})=1/3, dist({b},{a,b})=1/3 so fare everything is fine.

But then the check of the triangle inequality is:

dist({a},{b})+dist({b},{a,b}) > dist({a},{a,b})

1 + 1/3 > 1/3 there is no violation! — Preceding unsigned comment added by Ironmanlu (talkcontribs) 13:01, 6 June 2014 (UTC)

  • The counterexample is correct. The triangle inequality states "the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side". This means that picking any two sides to add together, they must always be greater. In other words, it must hold for every combination of sides. Although the above point by Ironmanlu tests that dist({a},{b})+dist({b},{a,b}) > dist({a},{a,b}), we must also test dist({a},{b})+dist({a},{a,b}) > dist({b},{a,b}) and dist({a},{a,b})+dist({b},{a,b}) > dist({a},{b}). Respectively, these give 1+1/3 > 1/3 (again, this is fine) and 1/3 + 1/3 = 2/3 which is not greater than the required value of 1. Therefore the triangle inequality does not hold. Therefore it is a valid counter-example. Neuropsychiatry (talk) 13:20, 24 June 2014 (UTC)

Notational confusion[edit]

The article currently says

Sørensen's original formula was intended to be applied to presence/absence data, and is
where A and B are the number of species in samples A and B, respectively, and C is the number of species shared by the two samples

The two parts of this use two different conflicting notational systems. The indicated definitions of A, B, C fit the first definition for QS. But then with A and B being numbers, the last expression, containing the union of the two numbers A and B, makes no sense. The definitions intended in the last expression are that A and B are sets, and the vertical bars are the cardinality operator.

I'm going to revise this to use only the set notation here, because I think it fits in best with what follows. Loraof (talk) 17:34, 26 March 2016 (UTC)

Dice published first: Why the naming preference given to Sørensen?[edit]

Please add two explanations or else revise this article: 1) why Sørensen's name is added, since he wasn't the first to publish. 2) why Dice's name is second, since he was first to publish.

It appears this should be called the Dice-Sørensen coefficient or simply Dice's Coefficient. Is prejudice against Americans on display here?

(It is also curious why the former has a Wikipedia page while the latter does not.) — Preceding unsigned comment added by Newagelink (talkcontribs) 06:55, 8 June 2016 (UTC)