Talk:Balanced repeated replication
|WikiProject Statistics||(Rated Start-class)|
The Hadamard matrix article had a fairly lengthy section given over to this topic, and it seemed out of place and, er, unbalanced there, so I've given it a page of its own. I am very far from being an expert on this, and there may be mistakes. Here are some specific points at which I am unsure whether what I've written is adequate.
1. (This wasn't addressed at all by the material in Hadamard matrix.) When the size of the Hadamard matrix and the number of strata don't quite match, our half-sample choices need no longer be exactly orthogonal (= perfectly uncorrelated). Presumably one should use a Hadamard matrix whose size is as close as possible to that of the number of strata. Is there any lore concerning the choice of rows from that matrix? (Choosing them at random would probably work OK, but maybe one can do better.) Does our estimate of the sampling variance want adjusting to reflect this failure of orthogonality?
2. (This is a point at which I'm not sure I understood the material in Hadamard matrix correctly.) If there are too many strata and one chooses to combine them into "variance strata", (a) how many is too many?, (b) is there any theory about how one should choose which ones to combine together?, (c) does the subsequent calculation need adjusting?
3. The only reference I consulted other than the Hadamard matrix article was the one listed at the bottom of the article. It doesn't in fact mention the Hadamard matrix technique, and seems to assume that all 2s possible half-samples are used. It would be nice to have a reference for the Hadamard matrix stuff.
4. I didn't understand the following text from the Hadamard matrix article:
Another complication occurs if there are several stages of sampling. Unfortunately, BRR will not capture the variance from these other sampling stages, and only captures the variance from the first stage (the between-PSU variance). One final twist is when the multi-stage sample design has "certainty PSUs" (these are in all first-stage samples, that is, they are sampled with "certainty"). In these situations, for the certainty PSUs only, we usually drop down to the stage-two sample, and for BRR we now let variance strata equal stage-two sampling strata, and variance PSUs be made up of stage-two sampling units (secondary sampling units or SSUs). If there are (conditional) stage two certainties, we subdivide further (to TSUs), and so on. Often, to increase efficiency, the strata at these lower levels are collapsed, often to the PSU, and the PSU is now the the variance stratum.
(it appears to be talking about cluster sampling rather than stratified sampling, and it's not clear to me how one does BRR with cluster sampling) so I'm leaving it here in the hope that someone who knows more about statistical sampling than I do will translate it into more comprehensible language and work it into the article.
5. It's not clear to me what the advantages and disadvantages of "Fay's method" are. The AIR page says only "In some instances this approach improves the estimates of statistics such as medians and percentiles".
6. The article might be improved by some comparison with other methods of variance estimation. I don't know enough statistics to say anything useful about that.
7. The AIR page offers two references that are almost certainly useful, but since I haven't actually checked them I think it would be irresponsible to put them in the reference list in the article. Here they are, for future reference:
7a. McCarthy, P J (1969) Pseudo-replication: half samples. Review of the International Statistical Institute 37:239-264.
7b. Särndal, C E; Swensson, B; Wretman, J (1992) Model-assisted survey sampling Springer-Verlag, New York.