Jump to content

Multilevel regression with poststratification: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
"sometimes called"
→‎History: Removed the sentence "The technique was used to successfully predict the 2016 election victory of Donald Trump", which cited a Telegraph article, where it simply says "It used a technique called multi-level regression and post-stratification, which accurately predicted that Donald Trump would become US president in 2016." But there's no detail or source on that, so that sentence is uncorroborated.
Tag: references removed
Line 14: Line 14:
The technique was originally developed by [[Andrew Gelman|Gelman]] and T. Little in 1997 <ref>{{cite journal |last1=Gelman |first1=Andrew |last2=Little |first2=Thomas|title=Poststratification into many categories using hierarchical logistic regression | journal = Survey Methodology | date=1997 |volume=23 |page=127–135|url=https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X19970023616}}</ref>, building upon ideas of Fay and Herriot<ref>{{cite journal |last1=Fay|first1=Robert |last2=Herriot|first2=Roger| title= Estimates of income for small places: An application of James-Stein procedures to census data | journal = Journal of the American Statistical Association| date=1979 |volume=74 |issue=423 |pages=1001–1012 |doi=10.1080/01621459.1979.10482505 |jstor=2286322}}</ref> and R. Little<ref>{{cite journal|last1=Little|first1=Roderick| title= Post-stratification: A modeler's perspective | journal = Journal of the American Statistical Association| date=1993 |volume=88 |issue=423|pages=1001–1012|jstor=2290792|doi=10.1080/01621459.1993.10476368}}</ref>. It was subsequently expanded on by Park, Gelman, and Bafumi in 2004 and 2006. It was proposed for use in estimating US-state-level voter preference by Lax and Philips in 2009. Warshaw and Rodden subsequently proposed it for use in estimating district-level public opinion in 2012.<ref name="Highton & Buttice" /> Wang et al.<ref name="wang"/> subsequently used it for estimating the outcome of the [[2012 US presidential election]] based on a survey of [[Xbox]] users, and it has also been proposed for use in the field of epidemiology.<ref name="Downes et al" />
The technique was originally developed by [[Andrew Gelman|Gelman]] and T. Little in 1997 <ref>{{cite journal |last1=Gelman |first1=Andrew |last2=Little |first2=Thomas|title=Poststratification into many categories using hierarchical logistic regression | journal = Survey Methodology | date=1997 |volume=23 |page=127–135|url=https://www150.statcan.gc.ca/n1/en/catalogue/12-001-X19970023616}}</ref>, building upon ideas of Fay and Herriot<ref>{{cite journal |last1=Fay|first1=Robert |last2=Herriot|first2=Roger| title= Estimates of income for small places: An application of James-Stein procedures to census data | journal = Journal of the American Statistical Association| date=1979 |volume=74 |issue=423 |pages=1001–1012 |doi=10.1080/01621459.1979.10482505 |jstor=2286322}}</ref> and R. Little<ref>{{cite journal|last1=Little|first1=Roderick| title= Post-stratification: A modeler's perspective | journal = Journal of the American Statistical Association| date=1993 |volume=88 |issue=423|pages=1001–1012|jstor=2290792|doi=10.1080/01621459.1993.10476368}}</ref>. It was subsequently expanded on by Park, Gelman, and Bafumi in 2004 and 2006. It was proposed for use in estimating US-state-level voter preference by Lax and Philips in 2009. Warshaw and Rodden subsequently proposed it for use in estimating district-level public opinion in 2012.<ref name="Highton & Buttice" /> Wang et al.<ref name="wang"/> subsequently used it for estimating the outcome of the [[2012 US presidential election]] based on a survey of [[Xbox]] users, and it has also been proposed for use in the field of epidemiology.<ref name="Downes et al" />


The technique was used to successfully predict the 2016 election victory of [[Donald Trump]].<ref name="Jones">{{cite news |last1=Jones |first1=Amy |title=Lib Dems will stand aside for Dominic Grieve, as polling predicts a Boris Johnson majority |url=https://www.telegraph.co.uk/politics/2019/10/30/lib-dems-will-stand-aside-dominic-grieve-polling-predicts-boris/ |accessdate=31 October 2019 |work=Daily Telegraph |date=30 October 2019}}</ref> [[YouGov]] also used the technique to successfully predict the overall outcome of the [[2017 UK general election]],<ref name="Revell">{{cite news |last1=Revell |first1=Timothy |title=How YouGov's experimental poll correctly called the UK election |url=https://www.newscientist.com/article/2134144-how-yougovs-experimental-poll-correctly-called-the-uk-election/#ixzz63vulf5ZP |accessdate=31 October 2019 |work=New Scientist |date=9 June 2017}}</ref> correctly predicting the result in 93% of constituencies.<ref name="Cohen">{{cite news |last1=Cohen |first1=Daniel |title='I've never known voters be so promiscuous': the pollsters working to predict the next UK election |url=https://www.theguardian.com/politics/2019/sep/27/voters-so-promiscuous-the-pollsters-working-to-predict-next-election |accessdate=31 October 2019 |work=The Guardian |date=27 September 2019}}</ref>
[[YouGov]] used the technique to successfully predict the overall outcome of the [[2017 UK general election]],<ref name="Revell">{{cite news |last1=Revell |first1=Timothy |title=How YouGov's experimental poll correctly called the UK election |url=https://www.newscientist.com/article/2134144-how-yougovs-experimental-poll-correctly-called-the-uk-election/#ixzz63vulf5ZP |accessdate=31 October 2019 |work=New Scientist |date=9 June 2017}}</ref> correctly predicting the result in 93% of constituencies.<ref name="Cohen">{{cite news |last1=Cohen |first1=Daniel |title='I've never known voters be so promiscuous': the pollsters working to predict the next UK election |url=https://www.theguardian.com/politics/2019/sep/27/voters-so-promiscuous-the-pollsters-working-to-predict-next-election |accessdate=31 October 2019 |work=The Guardian |date=27 September 2019}}</ref>


== Limitations and Extensions ==
== Limitations and Extensions ==

Revision as of 17:40, 21 August 2020

Multilevel regression with poststratification (MRP) (sometimes called "Mister P") is a statistical technique used for correcting model estimates for known differences between a sample population (the population of the data you have), and a target population (a population you would like to estimate for). For example, Wang et. al.[1] used survey data from Xbox gamers to predict U.S. presidential election results. The Xbox gamers were 65% 18-29-year olds and 93% male, while the electorate as a whole was 19% 18-29-year olds and 47% male.

The poststratification refers to the process of adjusting the estimates, essentially a weighted average of estimates from all possible combinations of attributes (in this example age and sex, though there were more). Each combination is sometimes called a "cell." The multilevel regression is used to smooth noisy estimates in the cells with too little data by using overall or nearby averages.

One application is estimating preferences in sub-regions (e.g., states, individual constituencies) based on individual-level survey data gathered at other levels of aggregation (e.g., national surveys).[2]

The technique and its advantages

The technique essentially involves using data from, for example, censuses relating to various types of people corresponding to different characteristics (e.g., age, race), in a first step to estimate the relationship between those types and individual preferences (i.e., multi-level regression of the dataset). This relationship is then used in a second step to estimate the sub-regional preference based on the number of people having each type/characteristic in that sub-region (a process known as "poststratification").[3] In this way the need to perform surveys at sub-regional level, which can be expensive and impractical in an area (e.g., a country) with many sub-regions (e.g. counties, ridings, or states), is avoided. It also avoids issues with consistency of survey when comparing different surveys performed in different areas.[4][2] Additionally, it allows the estimating of preference within a specific locality based on a survey taken across a wider area that includes relatively few people from the locality in question, or where the sample may be highly unrepresentative.[5]

History

The technique was originally developed by Gelman and T. Little in 1997 [6], building upon ideas of Fay and Herriot[7] and R. Little[8]. It was subsequently expanded on by Park, Gelman, and Bafumi in 2004 and 2006. It was proposed for use in estimating US-state-level voter preference by Lax and Philips in 2009. Warshaw and Rodden subsequently proposed it for use in estimating district-level public opinion in 2012.[2] Wang et al.[1] subsequently used it for estimating the outcome of the 2012 US presidential election based on a survey of Xbox users, and it has also been proposed for use in the field of epidemiology.[5]

YouGov used the technique to successfully predict the overall outcome of the 2017 UK general election,[9] correctly predicting the result in 93% of constituencies.[10]

Limitations and Extensions

MRP can be extended to estimating the change of opinion over time[4] and when used to predict elections works best when used relatively close to the polling date, after nominations have closed.[11]

Both the "multilevel regression" and "poststratification" ideas of MRP can be generalized. Multilevel regression can be replaced by nonparametric regression[12] or regularized prediction, and poststratification can be generalized to allow for non-census variables, i.e. poststratification totals that are estimated rather than being known.[13]

References

  1. ^ a b Wang, Wei; Rothschild, David; Goel, Sharad; Gelman, Andrew (2015). "Forecasting elections with non-representative polls" (PDF). International Journal of Forecasting. 31 (3): 980–991. doi:10.1016/j.ijforecast.2014.06.001.
  2. ^ a b c Buttice, Matthew K.; Highton, Benjamin (Autumn 2013). "How Does Multilevel Regression and Poststratification Perform with Conventional National Surveys?". Political Analysis. 21 (4): 449–451. doi:10.1093/pan/mpt017. JSTOR 24572674.
  3. ^ "What is MRP?". Survation.com. Survation. Retrieved 31 October 2019.
  4. ^ a b Gelman, Andrew; Lax, Jeffrey; Phillips, Justin; Gabry, Jonah; Trangucci, Robert (28 August 2018). "Using Multilevel Regression and Poststratification to Estimate Dynamic Public Opinion" (PDF): 1–3. Retrieved 31 October 2019. {{cite journal}}: Cite journal requires |journal= (help)
  5. ^ a b Downes, Marnie; Gurrin, Lyle C.; English, Dallas R.; Pirkis, Jane; Currier, Diane; Spital, Matthew J.; Carlin, John B. (9 April 2018). "Multilevel Regression and Poststratification: A Modeling Approach to Estimating Population Quantities From Highly Selected Survey Samples". American Journal of Epidemiology. 179 (8): 187. Retrieved 31 October 2019.
  6. ^ Gelman, Andrew; Little, Thomas (1997). "Poststratification into many categories using hierarchical logistic regression". Survey Methodology. 23: 127–135.
  7. ^ Fay, Robert; Herriot, Roger (1979). "Estimates of income for small places: An application of James-Stein procedures to census data". Journal of the American Statistical Association. 74 (423): 1001–1012. doi:10.1080/01621459.1979.10482505. JSTOR 2286322.
  8. ^ Little, Roderick (1993). "Post-stratification: A modeler's perspective". Journal of the American Statistical Association. 88 (423): 1001–1012. doi:10.1080/01621459.1993.10476368. JSTOR 2290792.
  9. ^ Revell, Timothy (9 June 2017). "How YouGov's experimental poll correctly called the UK election". New Scientist. Retrieved 31 October 2019.
  10. ^ Cohen, Daniel (27 September 2019). "'I've never known voters be so promiscuous': the pollsters working to predict the next UK election". The Guardian. Retrieved 31 October 2019.
  11. ^ James, William; MacLellan, Kylie (15 October 2019). "A question of trust: British pollsters battle to call looming election". Reuters. Retrieved 31 October 2019.
  12. ^ Bisbee, James (2019). "BARP: Improving Mister P Using Bayesian Additive Regression Trees". American Political Science Review. 113 (4): 1060–1065. doi:10.1017/S0003055419000480.
  13. ^ Gelman, Andrew (28 October 2018). "MRP (or RPP) with non-census variables". Statistical Modeling, Causal Inference, and Social Science.