Talk:Analysis of variance
|WikiProject Statistics||(Rated C-class, High-importance)|
|This is the talk page for discussing improvements to the Analysis of variance article.|
|This talk page is automatically archived by MiszaBot I. Any threads with no replies in 3 months may be automatically moved. Sections without timestamps are not archived.|
|This article is of interest to the following WikiProjects:|
"Fortunately, experience says that high order interactions are rare"
The above sentence is horribly unscientific (it is found in the subsection "ANOVA for multiple factors"), even though a reference is given. Experience with what? Clearly no one has gained experience with all possible topics in which an ANOVA may be used. The text of the reference is not easily accessed, so it is not possible to check under what conditions this "experience" is applicable. I have marked it with a [verification needed] to signal that this claim needs to be modified. While it is relevant to the topic, the sentence as presented is false.
Explain "no fit at all"
The example in the introduction is excellent, but some elaboration is needed of the statement: "An attempt to explain the weight distribution by dividing the dog population into groups (young vs old)(short-haired vs long-haired) would probably be a failure (no fit at all)."
Someone thinking of the weight distribution as an empircal histogram can object that any histogram can be written as sum of histograms corresponding to subgroups of the population. The phrase "no fit at all" might be interpreted as a claim that blue histograms do not actually add to the yellow one.
What the sentence is trying to convey is that "success" at dividing up the population into categories means that if you are given the category of a dog, you can use the corresponding histogram to estimate the dogs weight well. Hence "no fit at all" refers to the fact that if you are given a dog is (for example) young then you can't make a good guess of the dog's weight by using the weight histogram of young dogs.
Improve or remove use of "treatment," "factor," "factor level"
The term "treatment" is apparently central in this exposition of ANOVA, it appears early in the "Background and terminology" section, but ... before it is defined!
The later definition of "treatment" in the "Design-of-experiments terms" section says it is "a combination of factor levels." What kind of combination of levels? A sum of the level numbers? Look up "factor" to find out about factor levels: a factor is an investigator-manipulated process that causes a change in output. What kind of process might this mean? Adding and removing data? Why would you do that? Output of what? Might this "output" refer to how variance changes when the investigator manipulates the data like this? How can a process have "levels?"