|WikiProject Statistics||(Rated Start-class, Mid-importance)|
- 1 Feasibility of Fisher Exact test
- 2 similarity to Kullback-Leibler divergence
- 3 G^2
- 4 fisher.g.test in GeneTS not G-test as described?
- 5 Where does the 2 come from?
- 6 More precise stating of distribution of G under null hypothesis
- 7 splitting of the G statistics
- 8 Maybe a squeamish comment about notation
- 9 How to handle zero frequencies in observations?
- 10 Better introduction
Feasibility of Fisher Exact test
Before writing the words below, I ran several such calculations using this web-based application: http://home.clara.net/sisa/twoby2.htm with the Firefox browser: for examples in which all cells had values between 10,000 and 20,000 it took about 30 seconds to finish the calculations.
For example, a laptop with a 1.7 Ghz Pentium and 1 GB of RAM, specifications not considered particularly high end in 2006, can readily handle cases of the Fisher exact test in which each cell's value is around 10,000 with commonly available statistical software.
- Reverted as off topic. not really about G-test. Pete.Hurd 17:31, 31 July 2006 (UTC)
similarity to Kullback-Leibler divergence
Does the G-test and the Kullback-Leibler divergence mean the same but from another point of view?
Note that the "G-test" is referred to as the G^2 (g-squared) test (at least in psychology-related statistics).
- Humph. I've never seen that - please give a reference. seglea 23:29, 22 July 2005 (UTC)
To name few references to G^2 in psychological stats (this is common in multinomial modeling work in the memory literature and is becoming more common in fitting other types of models as well):
Dodson, Holland, & Shimamura, 1998. Using Excel to estimate parameters from observed data: An example from source memory data. Behavior Research Methods, Instruments, & Computers 1998, 30 (3), 517-526.
Batchelder & Reifer, 1999. Theoretical and empirical review of multinomial process tree modeling. Psychonomic Bulletin & Review, 6(1), 57-86.
Bayen, Murane, & Erdfelder. (1996). Source Discrimination, Item Detection, and Multinomial Models of Source Monitoring. Journal of Experimental Psychology: Learning, Memory, and Cognition 1996, Vol. 22, No. 1, 197-215.
Erdfelder & Buchner. (1998). Process-Dissociation Measurement Models: Threshold Theory or Detection Theory? Journal of Experimental Psychology: General, 127(1), 83-96.
fisher.g.test in GeneTS not G-test as described?
fisher.g.test implemented in GeneTS is an exact test for whether a time series is different from Gaussian white noise, not the alternative to the chi-square test as described.
Where does the 2 come from?
I've been trying to work out how Pearson's formula is an approximation for this test.
This is the formula for , except that the factor of 2 is still there. What was my error? Thanks! — ciphergoth 14:11, 3 June 2006 (UTC)
- Your approximation for ln(1+x) at wasn't good enough; it roughly works for each term, but its error for positive and negative numbers reinforces is enough for the factor of 2. Taking the -x^2/2 term and another approximation should get you there. --Henrygb 15:01, 9 March 2007 (UTC)
Then please tell me where is my error:
- You made a mistake in the one-before-last equality:
- —Preceding unsigned comment added by 220.127.116.11 (talk) 07:38, 23 February 2008 (UTC)
More precise stating of distribution of G under null hypothesis
This sentence should be made more precise
- Given the null hypothesis that the observed frequencies result from random sampling from a distribution with the given expected frequencies, the distribution of G is approximately that of chi-squared, with the same number of degrees of freedom as in the corresponding chi-squared test.
Does it converge in distribution? So does the statistic, right? Is the asymptotic rate of convergence quicker for than for ? I don't have any references on so I'm afraid I won't be of any help answering these questions.
Andyrew609 19:39, 27 November 2006 (UTC)
splitting of the G statistics
I am currently going through agrasti's: Categorical Data Analysis (2002) and at page 82 he gies a clean explanation on how to partition the G statistic (p.s: be aware that on the 2007 edition on the book, most of this section was cut - so don't bother looking for it there)
Maybe a squeamish comment about notation
It should be correct in the G formulae to write the bigger brackets outside the summation operator and containing the whole expression of the terms using indexes.
How to handle zero frequencies in observations?
Since in the formula
the logarithm is used, how terms are handled where ?
- I guess you should use Pearson's in that case, as implicitly recommended for small : “[T]he approximation to the theoretical chi-square distribution for the G-test is better than for the Pearson chi-squared tests in cases where for any cell |Oi − Ei | > Ei, and in any such case the G-test should always be used.”
- l0b0 (talk) 15:45, 16 April 2009 (UTC)