Multilevel regression with poststratification

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

Multilevel regression with poststratification (MRP) is a statistical technique used for estimating preferences in sub-regions (e.g., states, individual constituencies) based on individual-level survey data gathered at other levels of aggregation (e.g., national surveys).[1]

The technique and its advantages[edit]

The technique essentially involves using data from, for example, censuses relating to various types of people corresponding to different characteristics (e.g., age, race), in a first step to estimate the relationship between those types and individual preferences (i.e., multi-level regression of the dataset). This relationship is then used in a second step to estimate the sub-regional preference based on the number of people having each type/characteristic in that sub-region (a process known as "poststratification").[2] In this way the need to perform surveys at sub-regional level, which can be expensive and impractical in an area (e.g., a country) with many sub-regions (e.g. counties, ridings, or states), is avoided. It also avoids issues with consistency of survey when comparing different surveys performed in different areas.[3][1] Additionally, it allows the estimating of preference within a specific locality based on a survey taken across a wider area that includes relatively few people from the locality in question, or where the sample may be highly unrepresentative.[4]

History[edit]

The technique was originally developed by Gelman and T. Little in 1997 [5], building upon ideas of Fay and Heriott[6] and R. Little[7]. It was subsequently expanded on by Park, Gelman, and Bafumi in 2004 and 2006. It was proposed for use in estimating US-state-level voter preference by Lax and Philips in 2009. Warshaw and Rodden subsequently proposed it for use in estimating district-level public opinion in 2012.[1] Wang et al.[8] subsequently used it for estimating the outcome of the 2012 US presidential election based on a survey of Xbox users, and it has also been proposed for use in the field of epidemiology.[4]

The technique was used to successfully predict the 2016 election victory of Donald Trump.[9] Yougov also used the technique to successfully predict the overall outcome of the 2017 UK general election,[10] correctly predicting the result in 93% of constituencies.[11]

Limitations and Extensions[edit]

MRP can be extended to estimating the change of opinion over time[3] and when used to predict elections works best when used relatively close to the polling date, after nominations have closed.[12]

Both the "multilevel regression" and "poststratification" ideas of MRP can be generalized. Multilevel regression can be replaced by nonparametric regression[13] or regularized prediction, and poststratification can be generalized to allow for non-census variables, i.e. poststratification totals that are estimated rather than being known.[14]

References[edit]

  1. ^ a b c Buttice, Matthew K.; Highton, Benjamin (Autumn 2013). "How Does Multilevel Regression and Poststratification Perform with Conventional National Surveys?". Political Analysis. 21 (4): 449–451. doi:10.1093/pan/mpt017. JSTOR 24572674.
  2. ^ "What is MRP?". Survation.com. Survation. Retrieved 31 October 2019.
  3. ^ a b Gelman, Andrew; Lax, Jeffrey; Phillips, Justin; Gabry, Jonah; Trangucci, Robert (28 August 2018). "Using Multilevel Regression and Poststratification to Estimate Dynamic Public Opinion" (PDF): 1–3. Retrieved 31 October 2019. Cite journal requires |journal= (help)
  4. ^ a b Downes, Marnie; Gurrin, Lyle C.; English, Dallas R.; Pirkis, Jane; Currier, Diane; Spital, Matthew J.; Carlin, John B. (9 April 2018). "Multilevel Regression and Poststratification: A Modeling Approach to Estimating Population Quantities From Highly Selected Survey Samples". American Journal of Epidemiology. 179 (8): 187. Retrieved 31 October 2019.
  5. ^ Gelman, Andrew; Little, Thomas (1997). "Poststratification into many categories using hierarchical logistic regression". Survey Methodology. 23: 127–135.
  6. ^ Fay, Robert; Herriot, Roger (1979). "Estimates of income for small places: An application of James-Stein procedures to census data". Journal of the American Statistical Association. 74 (423): 1001–1012. JSTOR 2290792.
  7. ^ Little, Roderick (1993). "Post-stratification: A modeler's perspective". Journal of the American Statistical Association. 88 (423): 1001–1012. doi:10.1080/01621459.1993.10476368. JSTOR 2290792.
  8. ^ Wang, Wei; Rothschild, David; Goel, Sharad; Gelman, Andrew (2015). "Forecasting elections with non-representative polls" (PDF). International Journal of Forecasting. 31 (3): 980–991. doi:10.1016/j.ijforecast.2014.06.001.
  9. ^ Jones, Amy (30 October 2019). "Lib Dems will stand aside for Dominic Grieve, as polling predicts a Boris Johnson majority". Daily Telegraph. Retrieved 31 October 2019.
  10. ^ Revell, Timothy (9 June 2017). "How YouGov's experimental poll correctly called the UK election". New Scientist. Retrieved 31 October 2019.
  11. ^ Cohen, Daniel (27 September 2019). "'I've never known voters be so promiscuous': the pollsters working to predict the next UK election". The Guardian. Retrieved 31 October 2019.
  12. ^ James, William; MacLellan, Kylie (15 October 2019). "A question of trust: British pollsters battle to call looming election". Reuters. Retrieved 31 October 2019.
  13. ^ Bisbee, James (2019). "BARP: Improving Mister P Using Bayesian Additive Regression Trees". American Political Science Review. 113 (4): 1060–1065. doi:10.1017/S0003055419000480.
  14. ^ Gelman, Andrew (28 October 2018). "MRP (or RPP) with non-census variables". Statistical Modeling, Causal Inference, and Social Science.