Sampling frame

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In statistics, a sampling frame is the source material or device from which a sample is drawn.[1] It is a list of all those within a population who can be sampled, and may include individuals, households or institutions.[1]

Importance of the sampling frame is stressed by Jessen:[2]

In many practical situations the frame is a matter of choice to the survey planner, and sometimes a critical one. [...] Some very worthwhile investigations are not undertaken at all because of the lack of an apparent frame; others, because of faulty frames, have ended in a disaster or in cloud of doubt.

—Raymond James Jessen

Sampling frame types and qualities[edit]

In the most straight, such as when dealing with a batch of material from a production run, or using a census, it is possible to identify and measure every single item in the population and to include any one of them in our sample; this is known as direct element sampling.[1] However, in many other cases this is not possible; either because it is cost-prohibitive (reaching every citizen of a country) or impossible (reaching all humans alive).

Having established the frame, there are a number of ways for organizing it to improve efficiency and effectiveness. It's at this stage that the researcher should decide whether the sample is in fact to be the whole population and would therefore be a census.

This list should also facilitate access to the selected sampling units. A frame may also provide additional 'auxiliary information' about its elements; when this information is related to variables or groups of interest, it may be used to improve survey design. While not necessary for simple sampling, a sampling frame used for more advanced sample techniques, such as stratified sampling, may contain additional information (such as demographic information).[1] For instance, an electoral register might include name and sex; this information can be used to ensure that a sample taken from that frame covers all demographic categories of interest. (Sometimes the auxiliary information is less explicit; for instance, a telephone number may provide some information about location.)

An ideal sampling frame will have the following qualities:[1]

  • all units have a logical, numerical identifier
  • all units can be found – their contact information, map location or other relevant information is present
  • the frame is organized in a logical, systematic fashion
  • the frame has additional information about the units that allow the use of more advanced sampling frames
  • every element of the population of interest is present in the frame
  • every element of the population is present only once in the frame
  • no elements from outside the population of interest are present in the frame
  • the data is 'up-to-date'[3]

The most straightforward type of frame is a list of elements of the population (preferably the entire population) with appropriate contact information. For example, in an opinion poll, possible sampling frames include an electoral register or a telephone directory. Other sampling frames can include employment records, school class lists, patient files in a hospital, organizations listed in a thematic database, and so on.[1][4] On a more practical levels, sampling frames have the form of computer files.[1]

Not all frames explicitly list population elements; some list only 'clusters'. For example, a street map can be used as a frame for a door-to-door survey; although it doesn't show individual houses, we can select streets from the map and then select houses on those streets. This offers some advantages: such a frame would include people who have recently moved and are not yet on the list frames discussed above, and it may be easier to use because it doesn't require storing data for every unit in the population, only for a smaller number of clusters.

Sampling frames problems[edit]

The sampling frame must be representative of the population and this is a question outside the scope of statistical theory demanding the judgment of experts in the particular subject matter being studied. All the above frames omit some people who will vote at the next election and contain some people who will not; some frames will contain multiple records for the same person. People not in the frame have no prospect of being sampled.

Because a cluster-based frame contains less information about the population, it may place constraints on the sample design, possibly requiring the use of less efficient sampling methods and/or making it harder to interpret the resulting data.

Statistical theory tells us about the uncertainties in extrapolating from a sample to the frame. It should be expected that sample frames, will always contain some mistakes.[4] In some cases, this may lead to sampling bias.[1] Such bias should be minimized, and identified, although avoiding it completely in a real world is nearly impossible.[1] One should also not assume that sources which claim to be unbiased and representative are such.[1]

In defining the frame, practical, economic, ethical, and technical issues need to be addressed. The need to obtain timely results may prevent extending the frame far into the future. The difficulties can be extreme when the population and frame are disjoint. This is a particular problem in forecasting where inferences about the future are made from historical data. In fact, in 1703, when Jacob Bernoulli proposed to Gottfried Leibniz the possibility of using historical mortality data to predict the probability of early death of a living man, Gottfried Leibniz recognized the problem in replying:[5]

Nature has established patterns originating in the return of events but only for the most part. New illnesses flood the human race, so that no matter how many experiments you have done on corpses, you have not thereby imposed a limit on the nature of events so that in the future they could not vary.

—Gottfried Leibniz

Leslie Kish posited four basic problems of sampling frames:[6]

  1. Missing elements: Some members of the population are not included in the frame.
  2. Foreign elements: The non-members of the population are included in the frame.
  3. Duplicate entries: A member of the population is surveyed more than once.
  4. Groups or clusters: The frame lists clusters instead of individuals.

Problems like those listed can be identified by the use of pre-survey tests and pilot studies.

References[edit]

  1. ^ a b c d e f g h i j Carl-Erik Särndal; Bengt Swensson; Jan Wretman (2003). Model assisted survey sampling. Springer. pp. 9–12. ISBN 978-0-387-40620-6. Retrieved 2 January 2011. 
  2. ^ Raymond James Jessen (1978). Statistical survey techniques. Wiley. Retrieved 2 January 2011. [page needed]
  3. ^ Turner, Anthony G. "Sampling frames and master samples". United Nations Secretariat. Retrieved 12/11/2012.  Check date values in: |accessdate= (help)
  4. ^ a b Roger Sapsford; Victor Jupp (29 March 2006). Data collection and analysis. SAGE. pp. 28–. ISBN 978-0-7619-4363-1. Retrieved 2 January 2011. 
  5. ^ Peter L. Bernstein (1998). Against the gods: the remarkable story of risk. John Wiley and Sons. pp. 118–. ISBN 978-0-471-29563-1. Retrieved 2 January 2011. 
  6. ^ Leslie Kish (1995). Survey sampling. Wiley. ISBN 978-0-471-10949-5. Retrieved 11 January 2011. [page needed]