Talk:False discovery rate

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, High-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 High  This article has been rated as High-importance on the importance scale.
 


Methods[edit]

Should probably include reference to at least the Bonferroni correction, Fisher's LSD (as possible exposition of the problem) and Tukey method. While these methods are not as advanced as those listed, they do form a good basis for the area. HyDeckar 14:07, 17 March 2007 (UTC)

To me, these seem more relevant for the multiple comparison article than here. Tal Galili (talk) 10:44, 17 February 2013 (UTC)

Algorithms[edit]

This article needs sections covering the different FDR methods such as step-up, step-down control, and something about adaptive versus non-adaptive control --Zven 02:10, 21 September 2006 (UTC)

The statement “proportion of incorrectly rejected type I errors,” i.e. “proportion of incorrectly rejected false positives” does not make much sense. Should it read, perhaps, “proportion of incorrectly rejected null hypotheses in a list of rejected hypotheses”? Why not to use the definition by Benjamini and Hochberg: “expected proportion of errors among the rejected hypotheses.” Or, in other words: The expected ratio of the false positives to the number of rejected null hypotheses. Ref.: Benjamini and Hochberg (1995), J. R. Statist. Soc. B 57, pp. 289-300] [[[User:Jarda Novak|Jarda Novak]] 16:23, 1 November 2006 (UTC)]

For the purposes of clarity, something like your statement, “proportion of incorrectly rejected null hypotheses (type 1 errors) in a list of rejected hypotheses” would be an improvement --Zven 23:08, 2 November 2006 (UTC)

The variables in the table under 'classification of m hypothesis tests' seem to have been incorrectly defined. According to Benjamini & Hochberg (1995) the proportion of errors commited by falsely regecting the null is Q=V/(V+S), thus V is the number of null's falsely rejected (false positives) and V+S is the total number of nulls rejected. Therefore S is the number of false nulls rejected i.e. true negatives. However, in the table variable S has been defined as a true positive and conversely U has been defined as a true negative, when in fact it's a true positive.

I didn't want to edit the page without checking first, does anyone have any thoughts on this,have I totally lost my mind or am I right? Natashia muna 14:05, 25 October 2007 (UTC)

Copyrighted material[edit]

The pdf file pointed to by reference [1] (Benjamini and Hochberg) is copyrighted material taken from the jstor.org archive and posted contrary to jstor.org's explicitly stated policies. The pointer to the pdf file should be removed; however the citation can stay. Bill Jefferys 21:49, 22 December 2006 (UTC)

The link in question was on Yoav Benjaminis (one of the primary authors) homepage, so he is at fault for not adhearing to any copyright on jstor. This is an interesting issue as he is the person in breach of copyright. Is creating a link to his breach also in breach, or just furthur incriminating the author in question? Anyone who wants to can always use a search engine for 'Benjamini & Hochberg' anyway since is in google and others.

Hmm, interesting legal quandary here, as noted above this apparent violation of the publisher's copyright is on the homepage of its author. In most cases, actually, the manuscript belongs to the author(s) until the author(s) transfer copyright to the journal. Therefore, depending on what documents were signed by their respective authors it is entirely possible that one paper could legally be posted on the homepage of its author while another from the same journal would be in violation. We do not know what exact legalese Benjamini and Hochberg signed and therefore cannot determine whether Benjamini is violating that legalese by posting a copy of his paper on his website. In any case, I don't really see how Wikipedia is in breach of the law either way, because the protected content isn't uploaded, all the article does is point readers at Benjamini's page, which could readily be found by many other means anyway. 71.235.75.86 (talk) 00:09, 11 February 2008 (UTC)


Precision needed[edit]

I noticed in the Dependent tests part that c(m)=1 for positively correlated tests, i.e the same value as for independent tests. As an unfamiliar to the issue it seems surprising to me. Thus I think it should be explicitely written that the value of c(m) for positively correlated tests is the same as for independent tests. At the moment it looks like an error in the article. —The preceding unsigned comment was added by 129.199.1.20 (talk) 09:31, 15 February 2007 (UTC).

Title does not match content[edit]

The title of the article is "False discovery rate", which is a rate, not a statistical method. False discovery rate control is a statistical method. The content should be moved to a new article, called "False discovery rate control" or something appropriate. The current article, "False discovery rate", should be re-written about the rate of false discoveries. -Pgan002 03:51, 10 May 2007 (UTC)

False discovery rate (FDR) is an accepted name for this technique in the statistics literature. --Zvika 07:39, 30 August 2007 (UTC)
I agree with Zvika. Tal Galili (talk) 20:16, 15 February 2013 (UTC)

FDR is not expected FPR[edit]

Changed text to state that FDR is expected proportion of false positives among all significant hypotheses. Previously it stated that it is the expected FPR which is quite wrong. The false positive rate (FPR) is not a Bayesian measure as is the FDR (i.e. incorporates the prior probabilities of hypotheses). — Preceding unsigned comment added by Brianbjparker (talkcontribs) 15:06, 9 December 2010 (UTC)

Lead[edit]

Could somebody with knowledge of this subject add something to the lead that explains how FDR is a basic statistical measure (FP/(TP+FP)) as well as a method to control for multiple hypothesis testing? –CWenger (^@) 06:31, 26 July 2011 (UTC)

Definition of the FDR and some other points[edit]

The FDR is defined as the expectation of the ratio V/R and it is said that one might want to keep this expectation under a threshold value α : E(V/R) ≤ α However this inequality makes sense provided we know with respect to which probability measure the integral on the LHS is written. The problem is that we have many null hypothesis and then many ways to choose this probability measure. For the FDR to be a relevant indicator to control in hypothesis testing, I guess the expectation should be evaluated for probability distributions that satisfy the nulls which are assumed to be true. This ambiguity about the expectation should be discarded from the definition of the FDR

Next, V (for instance) is defined as the number of type one errors (ie the number of true nulls that are rejected). It is said it is an unobserved random variable. Everyone would agree it is unobserved as we don't know the number of true nulls (and then which ones are true). Now consider the "event" {V=k} (exactly k true nulls are rejected). I claim this is not an event in the sense that it cannot be written from the inverse image of a measurable function of the data. To tell wether {V=k} has been realized, we need information about the probability distribution of the data (namely wether this probability distribution satisfy or not the rejected nulls). If this is so, V is not a random variable for the random experiment under consideration. --194.57.219.138 (talk) 08:20, 17 February 2012 (UTC)

Do you have a reference for a paper discussing these issues? Tal Galili (talk) 20:15, 15 February 2013 (UTC)

Moving parts of "Classification of m hypothesis tests"[edit]

I just want to state that this section's content may be useful in other articles, but it should not (IMHO) be removed from the current article, since it is so basic to the way FDR is defined/explained. Cheers, Tal Galili (talk) 20:19, 15 February 2013 (UTC)

Adding "related statistics" section[edit]

I was noted by my college that the following two:

  • Fdr(z) is defined as: Fdr(z) = \frac{{{p_0}{F_0}\left( z \right)}}{{F\left( z \right)}}
  • fdr The local fdr is defined as: fdr = \frac{{{p_0}{f_0}\left( z \right)}}{{f\left( z \right)}}

Are not error rates, but in fact statistics which control some other error rates.

A contribution by future editors may be to check this claim carefully, and if so - see with which error rates they relate.

Also, it is worth checking how the different error rates relate to one another.

Another note from my college: recall Fdr=mFDR (marginal FDR), See Genovese & Wasserman.

Tal Galili (talk) 18:13, 8 March 2013 (UTC)

RFDR - removed (until proper citation is added)[edit]

The following section was removed from the article since it does not include proper citation:

Note that the mean \alpha for these m tests is \frac{\alpha(m+1)}{2m} which could be used as a rough FDR, or RFDR, "\alpha adjusted for m independent (or positively correlated, see below) tests". The RFDR calculation shown here provides a useful approximation and is not part of the Benjamini and Hochberg method; see AFDR below.

When asking an export in the field about the above quote he wrote:

I have no idea where this appeared. I do not understand in what sense it may be argued to be correct. It is also not related to the adaptive FDR

If someone can give proper citation then please restore the above paragraph to the main article. Tal Galili (talk) 14:59, 20 September 2013 (UTC)


Another paragraph, removed from the "Benjamini–Hochberg–Yekutieli procedure" section:

Using RFDR and second formula above, an approximate FDR, or AFDR, is the min(mean \alpha) for m dependent tests = RFDR / \sum ^m \frac{1}{i}.

No citation's needed. The first test is for (1/m)alpha, the last is for (m/m)alpha, and the mean is simply the midpoint for that list of test criteria, i.e.,

[alpha(1/m) + alpha(m/m)]/2 = (alpha(1+m)/m)/2 = alpha(m+1)/(2m). -Attleboro (talk) 18:18, 11 October 2013 (UTC)


Attleboro - once the sentence says this is a "rough estimate", I would like to see this given with citation. I will add a "citation needed" instead of removing the text.
No single value can be a "rough estimate" for FDR. Call it what it is, the mean value of alpha. Attleboro (talk) 19:37, 14 October 2013 (UTC)