Top-coded

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 133.11.242.248 (talk) at 05:46, 29 May 2013 (→‎Example: Top-coding of wealth at 30,000). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

In econometrics and statistics, a top-coded data set is one for which the value of variables above an upper bound are censored. This is often done to preserve the anonymity of people participating in the survey (for example, if a survey included a person with wealth of $51 billion, it would not be anonymous because people would know it is Bill Gates).

Example: Top-coding of wealth at 30,000

id age actual wealth wealth variable in data set
1 26 24,778 24,778
2 32 26,750 26,750
3 45 26,780 26,780
4 64 35,469 30000+
5 27 43,695 30000+

Jacob S. Hacker and Paul Pierson argue that the practice of top-coding, or capping the reported maximum value on tax returns ostensibly to protect the earner's anonymity, complicates the analysis of the distribution of wealth in the United States.[1]

Implications for ordinary least squares

  • If the lower bound of the top-coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent.
  • The top-coded group can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased.
  • The Tobit procedure is robust to top coding, and gives unbiased estimates.

See also

References

  1. ^ Hacker, Jacob S. and Paul Pierson (2010). Winner-Take-All Politics: How Washington Made the Rich Richer--And Turned Its Back on the Middle Class. Simon & Schuster. p. 13. ISBN 978-1-4165-8869-6.
  • Tobin, James (1958). "Estimation for relationships with limited dependent variables". Econometrica 26 (1), 24–36.