Top-coded

From Wikipedia, the free encyclopedia
Jump to: navigation, search

In econometrics and statistics, a top-coded dataset is one for which the upper bound is not known. This is often done to preserve the anonymity of people participating in the survey (for example, if a survey included a person with wealth of $51 billion, it would not be anonymous because people would know it is Bill Gates).

Contents

[edit] Example: Top-coding of wealth

id age income
1 26 24778 exact value
2 32 26750 exact value
3 45 26780 exact value
4 32 30000+ top coded
5 45 30000+ top coded

Jacob S. Hacker and Paul Pierson argue that the practice of top-coding, or capping the reported maximum value on tax returns ostensibly to protect the earner's anonymity, complicates the analysis of the distribution of wealth in the United States.[1]

[edit] Implications for ordinary least squares

  • If the lower bound of the top-coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent.
  • The top-coded group can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased.
  • The Tobit procedure is robust to top coding, and gives unbiased estimates.

[edit] See also

[edit] References

  1. ^ Hacker, Jacob S. and Paul Pierson (2010). Winner-Take-All Politics: How Washington Made the Rich Richer--And Turned Its Back on the Middle Class. Simon & Schuster. pp. 13. ISBN 978-1-4165-8869-6. 
  • Tobin, James (1958). "Estimation for relationships with limited dependent variables". Econometrica 26 (1), 24–36.
Personal tools
Namespaces
Variants
Actions
Navigation
Interaction
Toolbox
Print/export