In econometrics and statistics, a top-coded data set is one for which data points whose values are above an upper bound are censored. This is often done to preserve the anonymity of people participating in the survey (for example, if a survey included a person with wealth of $79 billion, it would not be anonymous because people would know there is a good chance it is Bill Gates).
Example: Top-coding of wealth at 30,000
|id||age||actual wealth||wealth variable in data set|
Jacob S. Hacker and Paul Pierson argue that the practice of top-coding, or capping the reported maximum value on tax returns ostensibly to protect the earner's anonymity, complicates the analysis of the distribution of wealth in the United States.
Implications for ordinary least squares
- If the lower bound of the top-coded group is used as a regressor value (30000 in the example above), OLS is biased and inconsistent.
- The top-coded group can be omitted from the regression entirely. Provided there are no systematic differences between the omitted group and the included groups, OLS is consistent and unbiased.
- The Tobit procedure is robust to top coding, and gives unbiased estimates.
- Jenkins, S. P., Burkhauser, R. V., Feng, S., & Larrimore, J. (2009). Measuring inequality using censored data: a multiple imputation approach, ISER Working Paper Series 2009-04, Institute for Social and Economic Research.
|This Econometrics-related article is a stub. You can help Wikipedia by expanding it.|