Multilevel regression with poststratification
|Part of a series on|
Multilevel regression with poststratification (MRP) (sometimes called "Mister P") is a statistical technique used for correcting model estimates for known differences between a sample population (the population of the data you have), and a target population (a population you would like to estimate for). For example, Wang et. al. used survey data from Xbox gamers to predict U.S. presidential election results. The Xbox gamers were 65% 18- to 29-year-olds and 93% male, while the electorate as a whole was 19% 18- to 29-year-olds and 47% male.
The poststratification refers to the process of adjusting the estimates, essentially a weighted average of estimates from all possible combinations of attributes (in this example age and sex, though there were more). Each combination is sometimes called a "cell." The multilevel regression is used to smooth noisy estimates in the cells with too little data by using overall or nearby averages.
One application is estimating preferences in sub-regions (e.g., states, individual constituencies) based on individual-level survey data gathered at other levels of aggregation (e.g., national surveys).
The technique and its advantages
The technique essentially involves using data from, for example, censuses relating to various types of people corresponding to different characteristics (e.g., age, race), in a first step to estimate the relationship between those types and individual preferences (i.e., multi-level regression of the dataset). This relationship is then used in a second step to estimate the sub-regional preference based on the number of people having each type/characteristic in that sub-region (a process known as "poststratification"). In this way the need to perform surveys at sub-regional level, which can be expensive and impractical in an area (e.g., a country) with many sub-regions (e.g. counties, ridings, or states), is avoided. It also avoids issues with consistency of survey when comparing different surveys performed in different areas. Additionally, it allows the estimating of preference within a specific locality based on a survey taken across a wider area that includes relatively few people from the locality in question, or where the sample may be highly unrepresentative.
The technique was originally developed by Gelman and T. Little in 1997, building upon ideas of Fay and Herriot and R. Little. It was subsequently expanded on by Park, Gelman, and Bafumi in 2004 and 2006. It was proposed for use in estimating US-state-level voter preference by Lax and Philips in 2009. Warshaw and Rodden subsequently proposed it for use in estimating district-level public opinion in 2012. Wang et al. subsequently used it for estimating the outcome of the 2012 US presidential election based on a survey of Xbox users, and it has also been proposed for use in the field of epidemiology.
Limitations and extensions
Both the "multilevel regression" and "poststratification" ideas of MRP can be generalized. Multilevel regression can be replaced by nonparametric regression or regularized prediction, and poststratification can be generalized to allow for non-census variables, i.e. poststratification totals that are estimated rather than being known.
- Wang, Wei; Rothschild, David; Goel, Sharad; Gelman, Andrew (2015). "Forecasting elections with non-representative polls" (PDF). International Journal of Forecasting. 31 (3): 980–991. doi:10.1016/j.ijforecast.2014.06.001.
- Buttice, Matthew K.; Highton, Benjamin (Autumn 2013). "How Does Multilevel Regression and Poststratification Perform with Conventional National Surveys?". Political Analysis. 21 (4): 449–451. doi:10.1093/pan/mpt017. JSTOR 24572674.
- "What is MRP?". Survation.com. Survation. Retrieved 31 October 2019.
- Gelman, Andrew; Lax, Jeffrey; Phillips, Justin; Gabry, Jonah; Trangucci, Robert (28 August 2018). "Using Multilevel Regression and Poststratification to Estimate Dynamic Public Opinion" (PDF): 1–3. Retrieved 31 October 2019. Cite journal requires
- Downes, Marnie; Gurrin, Lyle C.; English, Dallas R.; Pirkis, Jane; Currier, Diane; Spital, Matthew J.; Carlin, John B. (9 April 2018). "Multilevel Regression and Poststratification: A Modeling Approach to Estimating Population Quantities From Highly Selected Survey Samples". American Journal of Epidemiology. 179 (8): 187. Retrieved 31 October 2019.
- Gelman, Andrew; Little, Thomas (1997). "Poststratification into many categories using hierarchical logistic regression". Survey Methodology. 23: 127–135.
- Fay, Robert; Herriot, Roger (1979). "Estimates of income for small places: An application of James-Stein procedures to census data". Journal of the American Statistical Association. 74 (423): 1001–1012. doi:10.1080/01621459.1979.10482505. JSTOR 2286322.
- Little, Roderick (1993). "Post-stratification: A modeler's perspective". Journal of the American Statistical Association. 88 (423): 1001–1012. doi:10.1080/01621459.1993.10476368. JSTOR 2290792.
- Revell, Timothy (9 June 2017). "How YouGov's experimental poll correctly called the UK election". New Scientist. Retrieved 31 October 2019.
- Cohen, Daniel (27 September 2019). "'I've never known voters be so promiscuous': the pollsters working to predict the next UK election". The Guardian. Retrieved 31 October 2019.
- James, William; MacLellan, Kylie (15 October 2019). "A question of trust: British pollsters battle to call looming election". Reuters. Retrieved 31 October 2019.
- Bisbee, James (2019). "BARP: Improving Mister P Using Bayesian Additive Regression Trees". American Political Science Review. 113 (4): 1060–1065. doi:10.1017/S0003055419000480.
- Gelman, Andrew (28 October 2018). "MRP (or RPP) with non-census variables". Statistical Modeling, Causal Inference, and Social Science.