# Talk:Null hypothesis

WikiProject Statistics (Rated C-class, High-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
High  This article has been rated as High-importance on the importance scale.
WikiProject Mathematics (Rated C-class, Mid-importance)
This article is within the scope of WikiProject Mathematics, a collaborative effort to improve the coverage of Mathematics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Mathematics rating:
 C Class
 Mid Importance
Field: Probability and statistics
One of the 500 most frequently viewed mathematics articles.
WikiProject Guild of Copy Editors
A version of this article was copy edited by a member of the Guild of Copy Editors. The Guild welcomes all editors with a good grasp of English and Wikipedia's policies and guidelines to help in the drive to improve articles. Visit our project page if you're interested in joining! If you have questions, please direct them to our talk page.

## References

Seeing as how this is based on a scientific, or at least scholarly topic, I believe this needs references. I'm (I hope appropriately) adding the "Not Verified" tag, so hopefully someone will come through and add references so the poor, confused beginning stats students (myself included) can know this is more than the ramblings of a demented mind. Having had experience using the subject matter, even reading this straight from a textbook can make one crazy. Garnet avi 10:41, 5 September 2006 (UTC)

I think the main thing that the article misses is that the null hypothesis is always the hypothesis that there is no difference between two (or I suppose more) groups. Thus the word "null". Generally speaking, when studying something, you are trying to establish a difference between two groups (i.e. that group A, which received medication, did better than group B, which did not). It is statistically convenient (as well as philosophically convenient) to always start from the same premise. 24.45.14.133 04:44, 14 February 2007 (UTC) nullo

Sorry, this stuff was above the contents, but are not together or titled. I thought they would be more appropriate under the contents, so moved them to be so. Garnet avi 10:41, 5 September 2006 (UTC)

Is this sentence, from the article, correct?

But if the null hypothesis is that sample A is drawn from a population whose mean is no lower than the mean of the population from which sample B is drawn, the alternative hypothesis is that sample A comes from a population with a larger mean than the population from which sample B is drawn, and we will proceed to a one-tailed test.

It seems as if the null hypothesis says that mean(A) >= mean(B). Therefore the alternative hypothesis should be the negation of this, or mean(A) < mean(B). But the text states that the alternative hypothesis is that mean(A) > mean(B). Is this right?

I agree. Fixed. --Bernard Helmstetter 19:54, 8 Jan 2005 (UTC)

The difference between H0:μ1 = μ2 and H0:μ1 - μ2 = 0 is unclear, to say the least. --Bernard Helmstetter 20:02, 8 Jan 2005 (UTC)

This entry is confusing, to say the least. The introduction is somehow split in two sections by the TOC, and paradoxically is too short. The closing sections on controversies and pubilcation bias could be merged as well. I am not attempting a rewrite for I know little about statistics myself - but even so it is evident that the article could be clearer.--Duplode 01:20, 4 April 2006 (UTC)

The sentence "However, formulating the null hypothesis before collecting data rejects a true null hypothesis only a small percent of the time." seems to imply that the small percent of the time that a true null hypothesis is different from from the small number of true null hypothesis which would be rejected if tested on the same dataset. It implies that testing many hypothesis on a single dataset will produce less accurate results, with a justification which very much resembles the false statement, "after flipping a coin 4 times and getting heads each time, the odds of getting heads on the fifth flip are very low." 98.114.27.182 (talk) 17:58, 1 May 2011 (UTC)

## ??

I'm a sophomore in high school, here's my request:

I agree. This is unreadable. Behaviour of sets of data indeed! Nonsense. Can someone with scientific authority define it here please and just delete the rest of this garbage? —Preceding unsigned comment added by 90.179.192.118 (talk) 19:02, 5 October 2009 (UTC)

"Null hypothesis for dummies" would be useful. In the examples there are null hypotheses stating that "the value of this real number is the same as the value of that real number". Is there some explanation for why such a hypothesis is reasonable? It seems to me that for a very broad class of probability distributions the null hypothesis has probability of 0 and the opposite probability of 1. The article at the moment says this:

However, concerns regarding the high power of statistical tests to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between significant and substantial, since large enough samples are likely to be able to indicate differences however minor.

So the more data we have, the more likely it is that the null hypothesis is rejected? This is exactly what should happen if the null hypothesis is always false - the only difference is in how much data we need to prove that. Is this the case in actual use? If so, how does the theory justify drawing conclusions from a false premise? Presumably the theory is "robust enough" when there isn't "too much data", but how exactly does this work? 82.103.214.43 14:58, 11 June 2006 (UTC)

==

## example conclusion

"For example, if we want to compare the test scores of two random samples of men and women, a null hypothesis would be that the mean score of the male population was the same as the mean score of the female population, and therefore there is no significant statistical difference between them:"

This is wrong, the two samples can have the the same mean and be statistically totally different (e.g. differ in variance). 84.147.219.67 15:56, 26 June 2006 (UTC)

I made some changes: I deleted "and therefore there is no significant statistical difference between them:", because it is redundant and arguably incorrect. I also added a few words to the part about assuming they're drawn from the same population, to say that this means they have the same variance and shape of distribution too. I deleted the equation with mu1 - mu0 = 0 because it was out of context IMO given the sentence that was just before it, and because it is practically the same as the previous equation mu1 = mu0. Sorry I forgot again to put an "edit summary". Coppertwig 00:14, 5 November 2006 (UTC)

## "File drawer problem"?

What is it, and why does it make a sudden and unexplained appearance near the end of this article? If I hadn't gotten a C- in stats I'd go out and fix it myself. :) --User:Dablaze 13:29, 1 August 2006 (UTC)

The "file drawer problem" is this: suppose a researcher carries out an experiment and does not find any statistically significant difference between two populations. (For example, tests whether a certain substance cures a certain illness and does not find any evidence that it does.) Then, the researcher may consider that this result (or "non-result") is not very interesting, and put all the notes about it into a file drawer and forget about it, instead of publishing it which is what the researcher would have done if the test had found the interesting result that the substance apparently cures the illness.
Not publishing it is a problem for several reasons: one, other researchers may waste time carrying out the same test on a useless substance and also not publishing. Two, it is sometimes possible to find a statistically significant result by combining the results of several studies; this can't easily happen if it isn't published so nobody knows about it. Three, if various researchers keep repeating the same experiment and not finding statistically significant results, and then one does the same experiment and by a random fluke (luck) does get a statistically significant result, they might publish that and it would look as if the substance cures the illness, although if you combined the results of all the studies you would see that there is no statistically significant result overall.
It really does make sense if you can guess what "file drawer problem" means. Does it need a few words in the article to explain it? Coppertwig 00:00, 5 November 2006 (UTC)

## Accept, reject, do not reject Null Hypothesis

After a statistical test (say, determining p-values), one can only reject or not reject the Null Hypothesis. Accepting the alternative hypothesis is wrong because there is always a probability that you are incorrectly accepting or rejecting (alpha and beta; type I and type II error). --70.111.218.254 02:03, 22 November 2006 (UTC)

Actually, it seems that the first paragraph is entirely confusing. One can not ACCEPT null hypothesis. One can only REJECT or FAIL TO REJECT it. On the other hand, one can ACCEPT alternative hypothesis or FAIL TO ACCEPT it. See D. Gujarati: Basic Econometrics, Fourth Edition, 2004, p.134 --- Argyn

Besides type I and type II error, there's a problem which remains big even when your statistical significance is excellent: that both the Null Hypothesis and the Alternative Hypothesis can be false. I suppose they usually are; they are usually at best oversimplifications (models) of a situation in the real world. That's why the alternative hypothesis is merely "accepted", not "proven" nor "shown" nor "established". However, it can be "shown" or "established", with a certain statistical significance level, that the null hypothesis is false. --Coppertwig 10:29, 15 February 2007 (UTC)

I object to the statement "a null hypothesis (H0) is a hypothesis (scenario) set up to be nullified, refuted, or rejected ('disproved' statistically) in order to support an alternative hypothesis." The point of a null hypothesis is to represent what we currently expect. If the data in an experiment is not sufficient to reject that hypothesis, there is no point in considering an alternative hypothesis. Testing a null hypothesis is a form of Occam's razor--why consider a new, alternative hypothesis when the data is plausibly explained by an existing one (the null hypothesis)? So the null hypothesis is NOT "set up to be rejected"; it is set up to be _tested_. A null hypothesis represents an existing theory that may explain a given set of data. —Preceding unsigned comment added by Maxbox51 (talkcontribs) 17:44, 9 October 2008 (UTC)

## Formulation of null hypotheses

This article appears to be a little confused at the moment — I would appreciate a little discussion before I make some changes. In particular...

"if the null hypothesis is that sample A is drawn from a population whose mean is lower than the mean of the population from which sample B is drawn, the alternative hypothesis is that sample A comes from a population with a higher mean than the population from which sample B is drawn, which can be tested with a one-tailed test."

I believe this to be misleading. A null hypothesis is a statement of no effect — by definition it has no directionality. There is a very good reason for this: null hypothesis testing works by first assuming the null hypothesis to be true, and then calculating how often we would expect to see results as extreme as those observed even when the null hypothesis is true. That is, we are trying to find out how often the observed results would be obtained by chance.

It is only possible to do this when we have a well-defined null hypothesis — e.g. when it states that one mean is equal to another mean, or when a mean is equal to a defined value. It would not be possible to calculate our test statistic if our null hypothesis merely said, "Mean one is less than mean two", and indeed this would not be a null hypothesis.

I think the confusion arises in the case of a one-tailed test. Take, for example, an experiment investigating the height of men and women in a class. We might wish to test the hypothesis "that men are taller than women". In this case our hypotheses are as follows:

• Null: That men and women are of equal height.
• Experimental: That men are taller (have greater height) than women.

In this case, we have defined our experimental hypothesis in a one-tailed form. The question many people ask is, "But what if women are taller than men? Surely neither of our hypotheses addresses this?". The confusion then lies in whether or not the null hypothesis should incorporate this possibility. To the very best of my knowledge, it should not: the null hypothesis remains a statement of no effect.

The reason for this is that we are looking to see whether there is evidence to support the specific experimental hypothesis that we have postulated. If we find our results to be non-significant, this tells us that we do not have sufficient evidence to accept our specific experimental hypothesis. If it turns out that we're interested in a difference that we find in the other direction, well that suggests that we should have proposed a two-tailed hypothesis in the first place. Indeed, I would argue that it is very rare indeed that a one-tailed hypothesis is appropriate: we are almost always interested in results in the other direction from that predicted.

Does this sound sensible? If so, then I will modify the article accordingly, and will add some relevant citations! -- Sjb90 16:34, 15 May 2007 (UTC)

OK, I will start making some changes to this later. This has been quite a complicated issue to resolve, and has involved going back to the paper that first defined the term 'null hypothesis'. Full discussion can be found on Lindsay658's talk page. -- Sjb90 11:06, 18 May 2007 (UTC)

## Earlier Null hypothesis Discussion

I thought that the following should appear here (originally at [1] for the ease of others.Lindsay658 (talk) 21:21, 22 February 2008 (UTC)

Hi there,

Over on the null hypothesis talk page, I've been canvassing for opinions on a change that I plan to make regarding the formulation of a null hypothesis. However I've just noticed your excellent edits on Type I and type II errors. In particular, in the null hypothesis section you say:

The consistent application by statisticians of Neyman and Pearson's convention of representing "the hypothesis to be tested" (or "the hypothesis to be nullified") with the expression Ho -- associated with an increasing tendency to incorrectly read the expression's subscript as a zero, rather than an "O" (for "original") -- has led to circumstances where many understand the term "the null hypothesis" as meaning "the nil hypothesis". That is, they incorrectly understand it to mean "there is no phenomenon", and that the results in question have arisen through chance.

Now I know the trouble with stats in empirical science is that everyone is always feeling their way to some extent -- it's an inexact science that tries to bring sharp definition to the real world! But I'm really intrigued to know what you're basing this statement on -- I'm one of those people who has always understood the null hypothesis to be a statement of null effect. I've just dug out my old undergrad notes on this, and that's certainly what I was taught at Cambridge; and it's also what my stats reference (Statistical Methods for Psychology, by David C. Howell) seems to suggest. In addition, whenever I've been an examiner for public exams, the markscheme has tended to state the definition of a null as being a statement of null effect.

I'm a cognitive psychologist rather than a statistician, so I'm entirely prepared to accept that this may be a common misconception, but was wondering whether you could point me towards some decent reference sources that try to clear this up, if so! —The preceding unsigned comment was added by Sjb90 (talkcontribs) 11:07, 16 May 2007 (UTC).

Sjb90 . . . There are three papers by Neyman and Pearson:
• Neyman, J. & Pearson, E.S., "On the Use and Interpretation of Certain Test Criteria for Purposes of Statistical Inference, Part I", reprinted at pp.1-66 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1928).
• Neyman, J. & Pearson, E.S., "The testing of statistical hypotheses in relation to probabilities a priori", reprinted at pp.186-202 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1933).
• Pearson, E.S. & Neyman, J., "On the Problem of Two Samples", reprinted at pp.99-115 in Neyman, J. & Pearson, E.S., Joint Statistical Papers, Cambridge University Press, (Cambridge), 1967 (originally published in 1930).
Unfortunately, I do not have these papers at hand and, so, I can not tell you precisely which of these papers was the source of this statement; but I can assure you that the statement was made on the basis of reading all three papers. From memory, I recall that they were quite specific in their written text and in their choice of mathematical symbols to stress that it was O for original (and not 0 for zero). Also, from memory, I am certain that the first use of the notion of a "null" hypothesis comes from:
• Fisher, R.A., The Design of Experiments, Oliver & Boyd (Edinburgh), 1935.
And, as I recall, Fisher was adamant that whatever it was to be examined was the NULL hypothesis, because it was the hypothesis that was to be NULLIFIED.
I hope that is of some assistance to you.
It seems that it is yet one more case of people citing citations that are also citing a citation in someone else's work, rather than reading the originals.
The second point to make is that the passage you cite from my contribution was 100% based on the literature (and, in fact, the original articles).
Finally, and this comment is not meant to be a criticism of anyone in particular, simply an observation, I came across something in social science literature that mentioned a "type 2 error" about two years ago. It took me nearly 12 months to track down the source to Neyman and Pearson's papers. I had many conversations with professional mathematicians and statisticians and none of them had any idea where the notion of Type I and type II errors came from and, as a consequence, I would not be at all surprised to find that the majority of mathematicians and statisticians had no idea of the origins and meaning of "null" hypothesis.
I'm not entirely certain, But I have a feeling that Fisher's work -- which I cited as "Fisher (1935, p.19)", and that reference would be accurate -- was an elaboration and extension of the work of Neyman and Pearson (and, as I recall, Fisher completely understood the it was an oh, rather than a zero in the subscript). Sorry I can't be of any more help. The collection that contains the reprints of Neyman and Pearson's papers and the book by Fisher should be fairly easy for you to find in most university libraries.Lindsay658 22:37, 16 May 2007 (UTC)
Thanks for the references, Lindsay658 -- I'll dig them out, and have a bit of a chat with my more statsy colleagues here, and will let you know what we reckon. I do agree that it's somewhat non-ideal that such a tenet of experimental design is described rather differently in a range of texts!
As a general comment, I think it entirely acceptable for people working in a subject, or writing a subject-specific text book / course to read texts more geared towards their own flavour of science, rather than the originals. After all, science is built upon the principle that we trust much of the work created by our predecessors, until we have evidence to do otherwise, and most of these derived texts tend to be more accessible to the non-statistician. However I agree that, when writing for e.g. Wikipedia, it is certainly useful to differentiate between 'correct' and 'common' usage, particularly when the latter is rather misleading. This is why your contribution intrigued me so -- I look forward to reading around this and getting back to you soon -- many thanks for your swift reply! -- Sjb90 07:39, 17 May 2007 (UTC)

OK, I've now had a read of the references that you mentioned, as well as some others that seemed relevant. Thanks again for giving me these citations -- they were really helpful. This is what I found:
• First of all, you are quite right to talk of the null hypothesis as the 'original hypothesis' -- that is, the hypothesis that we are trying to nullify. However Neyman & Pearson do in fact use a zero (rather than a letter 'O') as the subscript to denote a null hypothesis. In this way, they show that the null hypothesis is merely the original in a range of possible hypotheses: H0, H1, H2 ... Hi.
• As you mentioned, Fisher introduced the term null hypothesis, and defines this a number of times in The Design of Experiments. When talking of an experiment to determine whether a taster can successfully discriminate whether milk or tea was added first to a cup, Fisher defines his null hypothesis as "that the judgements given are in no way influenced by the order in which the ingredients have been added ... Every experiment may be said to exist only in order to give the facts a chance of disproving the null hypothesis."
• Later, Fisher talks about fair testing, namely in ensuring that other possible causes of differentiation (between the cups of tea, in this case) are held fixed or are randomised, to ensure that they are not confounds. By doing this, Fisher explains that every possible cause of differentiation is thus now i) randomised; ii) a consequence of the treatment itself (order of pouring milk & tea), "of which on the null hypothesis there will be none, by definition"; or iii) an effect "supervening by chance".
• Furthermore, Fisher explains that a null hypothesis may contain "arbitrary elements" -- e.g. in the case where H0 is "that the death-rates of two groups of animal are equal, without specifying what those death-rates actually are. In such cases it is evidently the equality rather than any particular values of the death-rates that the experiment is designed to test, and possibly to disprove."
• Finally, Fisher emphasises that "the null hypothesis must be exact, that is free from vagueness and ambiguity, because it must supply the basis of the 'problem of distribution,' of which the test of significance is the solution". He gives an example of a hypothesis that can never be a null hypothesis: that a subject can make some discrimination between two different sorts of object. This cannot be a null hypothesis, as it is inexact, and could relate to an infinity of possible exact scenarios.
So, where does that leave us? I propose to make the following slight changes to the Type I and type II errors page and the null hypothesis page.
• I will tone down the paragraph about original vs. nil hypotheses: the subscript is actually a zero, but it is entirely correct that the hypothesis should not be read as a "nil hypothesis" -- I agree that it is important to emphasise that the null hypothesis is that one that we are trying to nullify.
• In the null hypothesis article, I will more drastically change the paragraph that suggests that, for a one-tailed test, it is possible to have a null hypothesis "that sample A is drawn from a population whose mean is lower than the mean of the population from which sample B is drawn". As I had previously suspected, this is actively incorrect: such a hypothesis is numerically inexact. The null hypothesis, in the case described, remains "that sample A is drawn from a population with the same mean as sample B".
• I will tone down my original suggestion slightly: A null hypothesis isn't a "statement of no effect" per se, but in an experiment (where we are manipulating an independent variable), it logically follows that the null hypothesis states that the treatment has no effect. However null hypotheses are equally useful in an observation (where we may be looking to see whether the value of a particular measured variable significantly differs from that of a prediction), and in this case the concept of "no effect" has no meaning.
• I'll add in the relevant citations, as these really do help to resolve this issue once and for all!
Thanks again for your comments on this. I will hold back on my edits for a little longer, in case you have any further comments that you would like to add!
-- Sjb90 17:33, 17 May 2007 (UTC)
I agree with your changes. As you can see from [[2]],

[[3]], [[4]], and [[5]] I really didn't have a lot to work with.

I believe that it might be helpful to make some sort of comment to the effect that when statisticians work -- rather than scientists, that is -- they set up a question that is couched in very particular terms and then try to disprove it (and, if it can not be disproved, the proposition stands, more or less by default).
The way that the notion of just precisely how the issue of a "null hypothesis" is contemplated by "statisticians" and the way that this (to common ordinary people counter-intuitive notion) of, essentially, couching one's research question as the polar opposite of what one actually believes to be the case (by contrast with "scientists" who generally couch their research question in terms of what they actually believe to be the case) is something that someone like you could far better describe than myself -- and, also, I believe that it would be extremely informative to the more general reader. All the best in your editing. If you have any queries, contact me again pls. Lindsay658 21:49, 17 May 2007 (UTC)
Just a note to say that I have finally had the chance to sit down and word some changes to the Null hypothesis article and the section on Type_I_and_type_II_errors#The_null_hypothesis. Do shout and/or make changes if you think my changes are misleading/confusing! -- Sjb90 11:33, 14 June 2007 (UTC)

## Analogy to proof by contradiction

I noticed the request to simplify (above), and had the idea of inserting something into the lede that would relate this idea to a proof by contradiction. For example, "This is similar to the idea of a proof by contradiction, but instead of a definite proof, experimental data is used to show that the null hypothesis is very unlikely to be true.". I'm not entirely sure if that wording is clear enough, or perhaps there's some imprecision; suggestions? —AySz88\^-^ 22:48, 13 March 2008 (UTC)

The language used in this article is extremely unclear and vague if not outright confusing. Will someone familiar with the subject please edit it, making sure that the text and subject matter have flow? This is a very badly compiled text so far. The exact definition for the null hypothesis seems to have been forgotten initially, and vague examples are presented to illustrate an undefined term. Thanks

### Hypothesis Testing Definition

I came to this page trying to understand the concept. Maybe somebody more learned than I can make the following statement more readily accessible to the layman... "Hypothesis testing works by collecting data and measuring how probable the data are, assuming the null hypothesis is true." —Preceding unsigned comment added by Jdownie (talkcontribs) 12:19, 14 October 2010 (UTC)

## The initial paragraph

I think that the last but one sentence is unclear and the last one is wrong. Therefore, I suggest that the end of the initial paragraph:

It is possible for an experiment to fail to reject the null hypothesis. The null hypothesis is never accepted as suspicion always remains over its validity. Failing to reject H0 allows for alternative hypotheses to be developed and tested.

to be changed into this:

It is possible for an experiment to fail to reject the null hypothesis. This does not prove H0 in any way, it only means that it can not be rejected. The best one can do to substantiate H0 is to consider all reasonable alternative hypotheses and reject them.

--Pot (talk) 17:26, 27 January 2009 (UTC)

I just now rewrote the sentence in question, before seeing your similar comments here... AnonMoos (talk) 10:22, 18 February 2009 (UTC)
I think the version above is more clear. What do you think about putting the above one instead of the new one you wrote? --Pot (talk) 12:04, 18 February 2009 (UTC)
The version above is not good. It almost says to try as many alternative models as possible ...and if enough alternatives are tried one can then always reject the null hypothesis, unless of course some effort is put into taking proper account of having done the multiple tests. In any case, some of these ideas should more properly be stated in the article on hypothesis testing. Melcombe (talk) 13:10, 18 February 2009 (UTC)
There was a problem with the wording "the null hypothesis is never accepted"; this may be correct according to some definitions of technical terminology, but it would fail to convey much useful meaning to those who aren't already knowledgeable. Your revisions are OK on this point... AnonMoos (talk) 15:27, 18 February 2009 (UTC)

## H0 or H0?

I suggest that H0 be used in place of H0. Same for H1 and similar ones in the article. This is not a variable, so probably its name should not be slanted. --Pot (talk) 16:15, 28 January 2009 (UTC)

## Merging

Someone added this in the main article: {{Mergeto|Statistical hypothesis testing|Talk:Null hypothesis|date=August 2008}}

why? I cannot see a discussion on this topic. Maybe it should be deleted? --Pot (talk) 16:31, 28 January 2009 (UTC)

I have adjusted the template to point here rather than to Talk:Statistical hypothesis testing as there was no discussion there at all. Melcombe (talk) 10:06, 29 January 2009 (UTC) And I have added a corresponding template to Statistical hypothesis testing. Melcombe (talk) 10:17, 29 January 2009 (UTC)

Reasons for merging would be
• The section uses term "null hypothesis testing", and refers to criticisms of this, when it is no different from "Statistical hypothesis testing" and this already has its own section on criticism ... so all the criticism should be together.
• The article is supposedly about the idea of a "Null hypothesis" and its role in testing, not about the more general idea of "null hypothesis testing" which is covered in "Statistical hypothesis testing" ...so the whole section has no relevance here.
Melcombe (talk) 10:24, 29 January 2009 (UTC)
Do you mean that only the Null_hypothesis#Controversy section should be merged? How to do that? --Pot (talk) 12:05, 29 January 2009 (UTC)
Well I think the publication-bias stuff should be moved elsewhere also, and that it would be better off in Statistical hypothesis testing. However there may be better ways of restructuring all that needs to be included, possibly by splitting it off into a separate aticle for the sophisticats. Melcombe (talk) 13:42, 29 January 2009 (UTC)

I have moved the blocks of text on criticism and publication bias to Statistical hypothesis testing. Melcombe (talk) 15:31, 10 March 2009 (UTC)

## Introductory definition.

Could I suggest that the concept of the null hypothesis be introduced in the article as this:

"something not proven, yet not contradicted by the data"

It can then be elaborated upon. If people agree that this definition is not a misrepresentation, then it may be less cryptic and more understandable to a general readership.

Feedback anyone? 121.73.7.84 (talk) 11:03, 28 May 2009 (UTC)

No, this is not accurate. You have some data. You, based on your experience, make a null hypothesis (e.g. the data is Gaussian). At this stage, the hypothesis may be well contradicted by the data, but you have not yet done so. At the next stage, you do statistical hypothesis testing. You analyse the results: if the test does not contradict the data to a certain significance, then you do not reject the null hypothesis. Only at this stage is your statement true. --Pot (talk) 11:46, 29 May 2009 (UTC)

OK, that makes sense. I've looked the definition up in different dictionaries and found varying definitions, so the clarification is useful. My Oxford dictionary states: a hypothesis suggesting that the difference between statistical samples does not imply a difference between populations. Your definition is clearer.

121.73.7.84 (talk) 02:16, 31 May 2009 (UTC)

The current introductory definition may seem right to people who already know what a null hypothesis. It is certainly of no use to anybody else. In reading this talk page I noticed some remarks that might help produce a definition for the general reader - remarks regarding "no effect" and "no" difference from what was already supposed. Something like that might bring into relief the reason for the word "null" in "null hypothesis." The Tetrast (talk) 15:41, 9 July 2009 (UTC).

I think that the text introduced by an anonymous user and deleted by user:Melcombe was headed towards the right direction, even if incorrect. The text changed the first line from "formally describes some aspect of the statistical behaviour of a set of data" into "formally describes the behaviour of a set of data as unbiased or normal". What about something like "formally describes a statistical hypothesis made on a set of data"?
I think we need something that is more clear than the current wording to people not very accustomed to statistics. Something that, in the first line, may not be formally exhaustive, but that explains the main concept as concretely as possible. --Pot (talk) 11:32, 9 September 2009 (UTC)
I think the problem here (and in the section below) is in trying to do too much in a lead-in, when there needs to be two things: a simple-to-understand lead-in and a more formal/complete introduction section containing a good definition. As an aside on Pot's comment is that the immediately preceding edit (19:45, 20 August 2009) had created the present first two sentences out of a longer more meaningful single sentence which might be worth going back to. Before moving towards expanded definitions, it would be good to first consider whether this article should be merged with alternative hypothesis as null and alternative are almost always considered together and as there is no necessity to have separate articles for every single item of statistical terminology. Such a joining might even make it easier to construct good definitions. Of course it would be important not to duplicate what is/should be in significance testing and/or Statistical hypothesis testing. Actually the last of these contains a lot of definition for "null hypothesis". A reason for not relying on "statistical hypothesis" to provide meaning in the first sentence is that it relies on readers knowing what this is. Melcombe (talk) 12:53, 9 September 2009 (UTC)

## Why "null"?

I think the article should explain in the definition section why the null hypothesis is called "null". There is virtually an unlimited number of "hypotheses" that will correspond to the definition currently provided by the article, while, generally in the context of a specific experiment, the null hypothesis will correspond quite naturally to a well defined situation. For example, if the experiment is to test the efficacy of some clinical intervention, the null hypothesis will normally correspond to absence of efficacy (no difference between treatment group and control group). If the experiment is to see the effect of exposure to a risk factor, the null hypothesis will usually correspond to the "no effect" situation (disease occurence is equally likely in exposed and non-exposed populations), etc. In general, the null hypothesis tend to represent, basically, the statu quo situation. This should be said up-front in the definition. --Dessources (talk) 20:02, 16 July 2009 (UTC)

But null hypotheses are not always used "in the context of a specific experiment". In fact in real-world statistical analyses they are almost never "in the context of a specific experiment". If look at Cox's list of reasons why a hypothesis test might be done (in hypothesis testing article I think), only 1 out of 6 reasons is directly equivalent to this sort of experiment to test a difference ... but all of them have a null hypothesis of some sort. Melcombe (talk)
I mentioned experiment as an example of an instance of the use of the statistical concept. Experiments are an important application of statistical concepts, in particular in the biomedical field. The current definition is defective in the sense that it does not provide the full meaning of the term "null hypothesis" - it lacks specificity. There are clear situations where no one would hesitate in choosing which situation corresponds to the null hypothesis, and which corresponds to the alternative hypothesis. When assessing the efficacy of a new treatment, clearly the null hypothesis states that the teratment makes no difference, and rejecting it supports the alternative hypothesis that the new treatment is efficacious (or deleterious). Opting for a null hypothesis stating that the new treatment is efficacious or deleterious and an alternative hypothesis stating that the treatment has no effect, would surely be confusing and would be an inappropriate use of the concept of null hypothesis. And yet, the definition, as it is currently stated, would not rule out such possibility. There is a semantic element missing in the definition. Furthermore, this definition is a bit convoluted, and does not correspond to standard usage. Most definitions available on the Web seem to do a better job. Perhaps going back to an authoritative and reliable source might help finding a wording that is both simpler and more specific that the current definition.
--Dessources (talk) 18:56, 17 July 2009 (UTC)
I think the "semantic reason" for the term you are trying to impute here is not connected to the reason Fisher used the term null hypothesis, so some care would be needed. Melcombe (talk) 13:01, 9 September 2009 (UTC)

## Coin example hypothesis selection

This example is basically saying that with the 5 heads test data it is possible to conclude that "coin is biased (towards heads)", as presumably shown with first selection of null hypothesis, but you can not conclude that "coin is fair", as presumably shown in the second null hypothesis!? But if you are able so conclude that "coin is biased (to any direction)" then you surely can conclude that "coin is not fair". So the selection of first hypothesis is actually the correct idea as it can conclude what we want to conclude, and the second can not. Or am I missing something? Yebbey (talk) 18:29, 1 March 2010 (UTC)

The point I suppose is that you are not allowed to choose the null hypothesis after seeing the data. The type of hypothesis testing is designed to reject a true null hypothesis at most 5% of the time. You were asked to check if the coin was fair, with no indication of which way it might be biased. 5 heads and 5 tails are equally unlikely outcomes. If you were to make a conclusion that the coin was biased with both these outcomes, you would be concluding that the coin was biased more then 5% of the time, even with an unbiased coin. This is not allowed. On the other hand, if you had prior information that the coin was biased towards heads, and made this your alternative hypothesis before looking at the data, you can make a conclusion. Of course, 5 flips is way to few. In practice, you would do many more. Perhaps the example should be changed, but that requires working out the correct probabilities for (say) 100 flips. —Preceding unsigned comment added by PeterWT (talkcontribs) 16:36, 2 March 2010 (UTC)

I'm admittedly not an expert on the null hypothesis, but the logic used in the introduction paragraph is certainly wrong. Say your null hypothesis is, "The coin is not biased," and you decide to flip the coin 1000 times. The probability of the coin coming up heads exactly 500 times is 2.52% -- and this is the most probable outcome, so the probability of the coin coming up heads any given number of times is even less. Therefore, no matter how many times your coin comes up heads, you can conclude by the logic of the current introduction, that the null hypothesis is false. You'll find that your coin is biased no matter what, even if it comes up heads exactly 500 times! Msuperdock (talk) 19:01, 23 June 2010 (UTC)

This is a bit late, and I'm not sure if the issue of logic you're addressing is fixed, but this is the purpose of the chi-squared test. Basically, if you get 501 and 499 (for example), you can calculate the probability of your observed result still being due to chance (that is, the probability of the null hypothesis being true) - and that probability will be fairly high. If you go to, say, 510 and 490, the probability of it being due to chance will still be fairly high, but less than for 501 and 499. Generally, we don't reject the null hypothesis until the probability of the data being a random occurrence has dropped below 5% (that is, we have 95% confidence in our results). If it's something very important, we'll wait until 99% confidence or even higher. Arc de Ciel (talk) 00:37, 12 February 2012 (UTC)

Let me explain: a) one can never speak of the probability of the null hypothesis being true. It's true or it's false. b) The decision is not based on the probability of occurrence of the observed outcome under the null hypothesis. Let's look at the coin example. If in 1000 tosses only heads are observed, or only tails, we would reject the null hypothesis. These values certainly will belong to the critical region. When we observe 999 times heads, or 999 times tails, we will also reject H0. So these values also belong to the critical region. The same with 998 and 2 times heads. We continue this way until we arrive at N or 1000-N times heads, where N is determined by the following: The total probability of all the values in the critical region should not exceed the predetermined level of significance. Nijdam (talk) 23:17, 12 February 2012 (UTC)
Apologies - I should have said, the probability of being incorrect if we were to reject the null hypothesis based on that data (that is, of thinking that we have a significant result when we do not). Or equivalently, I suppose, that if we had to use the current data to make predictions we should assume "as if" the null hypothesis were true. That being said, the null hypothesis is indeed true or false, but we do not know this and can only ever reach the point where we have very high confidence, depending on how much data we have gathered.
I'm pretty sure your statement b is incorrect though. A result where p = 0.05 means that there is a 5% probability that we would observe that result (i.e. a result that deviates at least that far from the expected value) if the null hypothesis were true, does it not? That is, the integral over the areas of the probability distribution at least that far from the expected value is 0.05. Arc de Ciel (talk) 11:10, 12 March 2012 (UTC)

Strangely enough you got it and you don't. Speaking of an integral with this example is impossible, that's why I mentioned the total probability, being the sum. And you say: ... observe that result ..., and there lies the problem in what you say. perhaps you mean to say the right thing but you don't, as you seem to indicate by the extension between brackets. Nijdam (talk) 14:46, 12 March 2012 (UTC)

Yes, I am speaking in the abstract, so "that result" is not meant to specifically refer to any particular result. I spoke of an integral because the probability distribution is well-approximated by a continuous curve. Arc de Ciel (talk) 04:00, 13 March 2012 (UTC)

## Copy-edits

Made a bunch of changes. Hope you like them. Lfstevens (talk) 21:21, 7 June 2010 (UTC)

went from a steady 1 to 3k traffic, to 143k today, thanks to this XKCD cartoon.Mercurywoodrose (talk) 17:20, 30 April 2011 (UTC)

## Needs mention & definition of "point null hypothesis"

I'm browsing through several academic papers on mathematical statistics & seeing mention of the concept "point null hypothesis". E.g. in the 2003 Hubbard & Bayarri article "P-values are not error probabilities" [sorry, no link], the authors state:

"In Neyman–Pearson theory, therefore, the researcher chooses a (usually) point null hypothesis and tests it against the alternative hypothesis."

I'm guessing that a typical point null hypothesis would be something like : 'mean of sample' = 'expected population mean'. Or in words : "the chosen statistical estimator (sample mean) is equal to some given single stated value. The 'point' is the single, exact stated numeric value, as opposed to a range of values."

Is this correct? Could someone knowledgeable please comment and/or edit the article..?

Brad (talk) 01:18, 22 June 2011 (UTC)

It is not really correct. See the existing section "Directionality" for an outline of why a real-world null hypothesis is often reducible to a simple "point" null hypothesis. Melcombe (talk) 08:16, 22 June 2011 (UTC)
OK, thanks Melcombe for your comment & reference. Please note though, you've still left the phrase "point null hypothesis" formally undefined ;)
Of course, when I wrote "not really correct", that was refering to your quote about researchers starting from a point null hypothesis. I have added some new stuff in a terminology section that is an attempt to define some of the terminology, but this risks the article becomong long and overly formal and, if that happens, the would be a need to consider merging the article into statistical hypothesis testing. Melcombe (talk) 13:57, 23 June 2011 (UTC)
Melcombe, thanks for adding the Terminology section. This is good progress, IMHO :) I disagree however that it risks making the article "long & overly formal". Clear definitions are always appreciated by newcomers to a topic. I would actually ask that you or someone else please add an example of each type of NH to the new section..? Thanks again! Brad (talk) 23:17, 3 July 2011 (UTC)

## Composite and/or compound

i come across the terms composite and also compound for a hypothesis that allows for more than one distribution. Is there a preference, or are both common terms? Nijdam (talk) 20:48, 15 October 2011 (UTC)

Are you confusing composite hypotheses with compound distributions? By "more than one distribution" do you mean "more than one family of distributions" ... a composite hypothesis includes "more than one distribution" in the sense of "more than one distribution having different parameter values within the same family of distributions". I have not seen "compound hypothesis" ... it is none of my 3 stats dictionaries. Melcombe (talk) 12:34, 17 October 2011 (UTC)

Look on the internet; even some reputable statistical articles use the term "compound". Nijdam (talk) 18:55, 17 October 2011 (UTC)

A coarse and simple tool that will often allow you to determine the relative usage of a term is at [6]. Formerly it gave the numbers of each usage; now it only indicates the "victor" in the "battle" (and, for a more detailed word count you would need to go to Google itself for each of the two terms). The observation that "even some reputable statistical articles use the term "compound"" is irrelevant, and producing that fact as evidence is a flawed argument; just because some people call koalas "bears", does not change the fact that they are marsupials. Also, the general term for the incorrect usage of a word or expression is catachresis; and it certainly seems that any individual applying the term compound hypothesis is using the expression catachrestically.Lindsay658 (talk) 21:37, 17 October 2011 (UTC)
Fine with me, I just asked. Nijdam (talk) 22:40, 18 October 2011 (UTC)

## Causality violation?

The article currently states:

However, formulating the null hypothesis before collecting data rejects a true null hypothesis only a small percent of the time.

Surely, if taken literally, this violates causality: the collected data cannot be affected by when the formulation was made. Perhaps this should be reworded: You should think about what the hell you're doing before you start taking data? linas (talk) 21:26, 10 April 2012 (UTC)

I have rephrased this. I guess the point being attempted was related to Testing hypotheses suggested by the data and I have now linked this in. Melcombe (talk) 09:20, 13 April 2012 (UTC)

## Please clarify re. "very unlikely"

In the "Principle" section, the following sentence is confusing to the statistical layman: "If the data-set is very unlikely, defined as belonging to a set of data that only rarely will be observed (usually in less than either 5% of the time or 1% of the time), the experimenter rejects the null hypothesis concluding it (probably) is false."

What does this mean? Here is my best guess, based on the prose around the sentence in question. If my null hypothesis is true, then I expect certain observations to be rare, and appear 1% to 5% of the time. If I get a dataset in which these rarely-expected results are common, then that rejects the null hypothisis.

Is that correct? If so, what is the definition of "common"? If I expect rare events 1% to 5% of the time, what percentage of rare events do I need for a data-set to be "very unlikely"?

I don't consider myself sufficiently expert to edit this page. I do feel qualified enough to point out when the prose is confusing to those of us who are not expert statisticians. I'd like to ask if those with the proper expertise might consider rewriting or expanding the sentence quoted above. I think the topic is important, and worthy of prose that the lay person can understand.

Thanks! Coachjpg (talk) 13:02, 8 May 2012 (UTC)

No, not quite correct. It is not the observations (meaning individual observations) that need to be rare, but the overall set of observations (taken all together) that needs to be rare. In practice this concept is simplified in that the overall set of observations is reduced to the observed value of a test statistic and it is the rareness of occurence of more extreme values of the test statistic that is ued to judge rareness. Inherent in the definition of the particular test statistic used is a particular way of defining the particular type of divergence from the null hypothesis that is being looked for. The "rareness" needs to be considered as a whole for the whole collection of observations ... what proportion of new sets of observations would produce more extreme vlaues of the test statistic. Melcombe (talk) 15:45, 8 May 2012 (UTC)
I have attempted a rephrasing of the paragraph. Is it any better? Much more detail is/should be found in statistical hypothesis testing, and the text in the present article needs to be limited so as not to duplicate too much material. Melcombe (talk) 16:15, 8 May 2012 (UTC)

## James Alcock on the null hypothesis

I'm going to insert a section discussing the null hypothesis and parapsychology from psychologist James Alcock (whose page I'm rewriting). I think his paper clearly clarifies what the null hypothesis is which will help readers better understand this oncept. http://www.imprint.co.uk/pdf/Alcock-editorial.pdf Sgerbic (talk) 15:19, 9 June 2012 (UTC)

## Point hypothesis

I'm not sure whether the term "point hypothesis" is a regular notion in statistics. At least is the description in the article not very clear. Nijdam (talk) 10:21, 12 June 2012 (UTC)

## Stats on wikip

The first sentence in this doesn't follow most other wiki arts on things, in that it is hard to see what it gets at! It should explain it in far easier terms too. Stats on wiki is pretty hard to get comapred to most subjects! — Preceding unsigned comment added by 137.43.182.225 (talk) 13:20, 5 December 2012 (UTC)

## Marked article for much needed editing

There is not enough citation in this article, and some of the citations do not come from academic sources. I am particularly concerned about the lead remark that "there is no relationship between two measured phenomena." Additionally, the lead makes it sound like the null hypothesis is central to scientific research, when in fact, it has been criticized.

The introduction has a few remarks that appear more like opinion, that is unjustified from academic literature in statistics.

Moving further into the article there is a lack of citation. Thelema418 (talk) 07:59, 10 December 2013 (UTC)