Talk:Sampling bias

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Statistics (Rated C-class, Mid-importance)
WikiProject icon

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C-Class article C  This article has been rated as C-Class on the quality scale.
 Mid  This article has been rated as Mid-importance on the importance scale.
WikiProject Philosophy (Rated C-class, Mid-importance)
WikiProject icon This article is within the scope of WikiProject Philosophy, a collaborative effort to improve the coverage of content related to philosophy on Wikipedia. If you would like to support the project, please visit the project page, where you can get more details on how you can help, and where you can join the general discussion about philosophy content on Wikipedia.
C-Class article C  This article has been rated as C-Class on the project's quality scale.
 Mid  This article has been rated as Mid-importance on the project's importance scale.


(to Martin) Hi, this is your namesake, Mrdice. I'm into the logical fallacies at the moment, and I noticed you moved some articles around. For example, you put the spotlight fallacy together with biased sample. Is it alright with you if I give each fallacy, however closely related it is to another, its own page? Most fallacies are in some way or another related to eachother, and if we're going to put similar ones together, it would be a huge job, even more so because some of them belong to different catagories at the same time. My idea is to give each one its own page, and then mention at the bottom of the page to which ones they're related. Mrdice 03:19, 2004 Feb 16 (UTC)

"At best, this means the people who care most about an issue will answer"[edit]

This isn't necessarily best; it can be quite the opposite. Those who care most aren't bound to be true reflections of the population, especially if there's more room or inclination for people to care (or not) one way than the other. I can imagine this: Some government proposal has already been given the green light or even just been implemented. Many people were in favour of it, but they are already set to get their way and so might not bother voting. OTOH those who are against the idea are likely to flood the poll in protest.

It's also possible that the statment of a poll can be biased, by stating only one side of the argument. I can imagine this leading to biased results.... -- Smjg 15:31, 17 Jun 2005 (UTC)

Some bad splitting occured here...[edit]

This article has turned into a childish description of one specific type of misuse of statistics. Biased sampling has many forms and is sometimes unavoidable, but when the type of bias in the sampling mechanism is known it can be taken into account and possibly corrected in the analysis (that is, one can draw inference for the unbiased population given a biased sample). So it needs a major rewrite I cannot provide at the time.--Boffob 18:23, 12 January 2007 (UTC)

Cleaning up[edit]

This could use an overhaul in spelling and grammar check. If it weren't so late I'd do it myself. I'll come around and do it sometime if no one else wants to. (forgot to sign my post!)Imasleepviking 13:46, 1 February 2007 (UTC)

There are some sentence fragments as well: "Provided that certain conditions are met (chiefly that the sample is drawn randomly from the entire sample) these samples."??? —Preceding unsigned comment added by Nimajji (talkcontribs) 02:53, 25 March 2009 (UTC)

Biased samples and parameter estimation[edit]

I don't have time to discuss it much now, but I want to mention that there are many methods that can take different forms of biased sampling into account (not just reweighting of badly balanced samples), and there are cases where ignoring the bias will still lead to consistent (if inefficient) estimates of the parameters of interest (or a subset of them). So saying that estimates will always be erroneous and that statistical methods always assume that samples are representative of the target population is simply untrue. Though I reverted some recent changes made by an anonymous IP, I agree that the "problems with biased samples" section needs some rewriting.--Boffob (talk) 01:35, 26 January 2008 (UTC)

This may sound like a quibble, but the article should not then say that any statistic calculated form a biased sample has the potential to be erroneous (I think that was part of the point of my edit, but I haven't checked). First of all, a stratified sample is classified by the article as a biased sample, and its accuracy can be estimated; in that sense it has no potential to be erooneous. Seondly, if by potential to be erroneous we mean that that the statistic calculated from a sample may be markedly different from the parameter, that is true of any sample, so it's not a distinguishing characteristic of biased samples. Perhaps the point could be that a confidence interval cannot be calculated.
I would also like to raise the issue of whether this article is needed in addition to Stratified sampling and Non-probability sample. Phrenesiac (talk) 00:29, 27 January 2008 (UTC)
I did write this in the hope to reword some contentious parts of the article. Of course, any estimation is bound to errors. The issue here is that ignoring biased sampling can lead to, surprise, surprise, asymptotically biased estimators of the parameters of interest, as opposed to consistent estimators, but there are actually cases where some parameters will be consistently estimated despite biased sampling. The other thing is that a biased sample is not necessarily a stratified sample (deliberate sampling method over well-defined strata, but stratified sampling may still have biased sampling issues) or a non-probability sample, as in many cases, the probability of sampling an individual from the target population can be computed, the issue is that it is not uniform over that target population (which would make it a random sample proper, and I realize that the Nonprobability sampling article does not define random sample properly). For example, length-biased sampled sampling and size-biased sampling have been studied extensively. So yes, this article is needed on top of the other two, it just needs some rewriting in a few places.--Boffob (talk) 01:43, 27 January 2008 (UTC)
I apologize for not paying attention to the changes to Nonprobability sampling. I believe that definition has changed a lot since last I looked closely at it. Anyway, we seem to be talking at cross-purposes here. I'll leave to you to make the necessary changes to this article. Though God knows how long they'll last. Phrenesiac (talk) 02:17, 27 January 2008 (UTC)
No need to apologize. I don't have the other two on my watchlist, so I haven't followed them. This one has been relatively stable since the last major rewrite that improved it a lot, so I haven't bothered with it so much, but there is some room for improvement, I'm just not sure how to rewrite this "erroneous" estimation bit without getting too technical.--Boffob (talk) 05:13, 27 January 2008 (UTC)


The content of the Spotlight Fallacy section of this article has been identified as plagiarism - it seems to sourced verbatim from a copyrighted source without attribution. The allegation of plagiarism can be found here: and the apparent original source here: I suggest someone either write an original entry or get permission to use this material in Wikipedia (and add an attribution). Thanks, Outeast —Preceding unsigned comment added by (talk) 12:43, 19 February 2009 (UTC)

Yes, in this edit of 14 October 2006, this editor (most of whose contributions were deleted as inane) copied in stuff that was on that Nizkor page earlier that same month, when it clearly said "© The Nizkor Project, 1991-2009" with no mention of copyleft, let alone GFDL. A blatant copyright violation.
Outeast, if you see something like this again, anywhere, go ahead and delete it. And please mention it at the foot of the relevant talk page. Thank you. -- Hoary (talk) 14:56, 28 March 2009 (UTC)

An interesting twist on this, as it appears that the plagiarizer was the original author. See this (likely to be archived hereabouts). -- Hoary (talk) 15:56, 28 March 2009 (UTC)

Move to sampling bias[edit]

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the move request was page moved. Vegaswikian (talk) 22:06, 22 November 2009 (UTC)

Biased sampleSampling bias — As seen in Template:Biases, every bias-article that has the ability to do so is in the ___ bias format, except this one. I think the unusual naming originally was a way to make it appear different from selection bias, but I've tried to rather explain the difference in the article itself, so any extraordinary name is no longer required. Mikael Häggström (talk) 16:31, 14 November 2009 (UTC)

partial negative. A problem is that "Sampling bias" is ambiguous, as it could be interpreted as "sampling the bias". Why say "biased sample" is unusual ... if you have a biased sample you have a biased sample, it would be more unusual to say you have a sample with sampling bias, as that is longer. However, the article uses both terms in what looks to be appropriate ways in different places, so one or other term would not be excluded whatever is done. Melcombe (talk) 13:19, 18 November 2009 (UTC)

Pro I arrived at this page via the redirect. "Sampling the bias" is a novel expression to me, and I don't know what you mean by it. Could you provide examples? Paradoctor (talk) 15:29, 19 November 2009 (UTC)

The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.

Merge from Ascertainment bias[edit]

As described at the top of this article, ascertainment bias is apparently the same as sampling bias. Mikael Häggström (talk) 17:26, 14 November 2009 (UTC)

But the description as the top of ascertainment bias does seem to reveal a difference. It seems that with ascertainment bias one is not targetting the right population, or possibly even recognising that that the population exists, while with sampling bias one is not sampling the (correct) population in a fair way. Melcombe (talk) 13:26, 18 November 2009 (UTC)
The relevant quotes from the two ledes are
  • "ascertainment bias occurs when false results are produced by non-random sampling"
  • "biased sample" ... "results from sampling bias (systematic error due to a non-random sample of a population)"
That's the common ground. The ascertainment bias article appears to be about nonprobability sampling, which is a non-random sampling in which the selection probability for some population members is zero or unkown. The "exclusion bias" mentioned in biased sample is a subtype of nonprobability sampling.
Since both articles are rather short, there is no need for a separate article for ascertainment, and the prudent course of actions would be to expand the paragraph on exclusion bias, to include anything useful from ascertainment bias and to mention the alternate name, and to redirect ascertainment there. Paradoctor (talk) 17:11, 19 November 2009 (UTC)

The merge of sampling and ascertainment biases is extremely misleading. As mentioned above, ascertainment bias is an ex post error of judgement created by nonprobability sampling, whereas sampling bias occurs mostly ex ante during data collection. Merging both notions does not help to understand either. If you want to use an umbrella term, it should be selection bias.

I strongly recommend separating all notions, with illustrations:

  • sampling bias occurs if you try to conduct an online survey with no controls -- see, e.g., [1]
  • ascertainment bias occurs if you try to establish psychiatric diagnoses among violent criminals -- see, e.g., [2]
  • selection bias describes both.

An expert judgement would be needed to verify my claim, but I strongly believe that something is very wrong with the current state of the entry. ---- phnk (talk) 19:57, 21 January 2011 (UTC)

Although ascertainment bias might be a type of sampling bias, they are certainly not the same. It's extremely odd that the articles have been merged, such that ascertainment bias is not described at all. (talk) 11:25, 13 July 2012 (UTC)

Merged, but in which types of sampling bias should these examples belong?[edit]

I merged ascertainment bias to here, but most examples were already mentioned, and I found it unnecessary to have multiple examples for each type. Yet, if you find any of these very important, feel free to add them too. Mikael Häggström (talk) 05:24, 4 May 2010 (UTC)

For example, to find the male/female ratio in a country it is not necessary to count everyone in the country: selection of a statistical sample of the population will be adequate. The way the sample is selected can influence the result. For example, if the residents of a housing project for elderly persons was counted, the result could be biased in favor of females, who statistically live longer than males.

A simple classroom demonstration of ascertainment bias is to estimate the primary sex ratio (which we know to be around 1:1) by asking all female students to report the ratio in their own families, and comparing the result with the same question asked of male students. The females will collectively report a higher ratio of females, as all families having only male children are excluded by the selection criterion. The males will report a higher ratio of males, for the complementary reason.

Ascertainment bias is important in studying the genetics of medical conditions, since data are typically collected by physicians in a clinical setting. The results may be skewed because the sample is of patients who have seen a physician, rather than a random sample of the population as a whole. Berkson's paradox illustrates this effect.

Often, robust experimental design can minimize this effect. Another way to deal with this effect is to take the non-random sampling into account when analyzing results.

Dewey : Phone Sampling[edit]

The section on "Historical Examples seems to contain an error. It asserts that the Tribune came out with the "Dewey Defeats Truman" headline due to reliance on a "phone survey". Yet the source cited for this section says something quite different: This source asserts that the Gallup Poll take for the election was taken two weeks before the election, and not updated. Nothing is said about this having been a "phone poll," and the actual reason for the incorrect headline was the Tribune's reliance on old data, not a "phone poll". I think an earlyer portion of the article discussing the 1936 Landon-Roosevelt election has been conflated with the Dewey-Truman material. —Preceding unsigned comment added by (talk) 23:46, 19 May 2011 (UTC)

an article link[edit]

There is a short article on samlple selection and consistency of estimators, that discusses different sample selection equations, and would like to link to it. See it here. Let me know if it would be fine.

              - I have now added the link for selection equations (15.Nov2011)  — Preceding unsigned comment added by Esben.juel (talkcontribs) 18:01, 15 November 2011 (UTC)