Jump to content

Talk:Opinion polling for the United Kingdom European Union membership referendum: Difference between revisions

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Content deleted Content added
Line 229: Line 229:


::Last point: the sizes of the polls listed in the table conducted this month and in May range from 800 to 3405. The median size is 1703. You assume the correlation between poll size and the accuracy of predictions made on the basis of poll results is sufficiently significant to "require" to be taken into account if one is to make sensible predictions on the basis of a sizeable collection of polls. Do you seriously think that a hypothesis such as that "sample sizes between 800 and 1703 yield more accurate predictions for referendum results than sizes between 1704 and 3405" should be accepted? With how much confidence? Please consider the concept of false accuracy. Bear in mind that polling companies that have carried out polls for this referendum are currently concentrating their public relations spending far more on extolling sampling methodology and post-poll weighting methodology than on stressing how big a sample size they can supply. Why? I am trying to encourage you to allow a broader understanding of context to inform your statistical understanding.[[User:Elephantwood|Elephantwood]] ([[User talk:Elephantwood|talk]]) 12:15, 12 June 2016 (UTC)
::Last point: the sizes of the polls listed in the table conducted this month and in May range from 800 to 3405. The median size is 1703. You assume the correlation between poll size and the accuracy of predictions made on the basis of poll results is sufficiently significant to "require" to be taken into account if one is to make sensible predictions on the basis of a sizeable collection of polls. Do you seriously think that a hypothesis such as that "sample sizes between 800 and 1703 yield more accurate predictions for referendum results than sizes between 1704 and 3405" should be accepted? With how much confidence? Please consider the concept of false accuracy. Bear in mind that polling companies that have carried out polls for this referendum are currently concentrating their public relations spending far more on extolling sampling methodology and post-poll weighting methodology than on stressing how big a sample size they can supply. Why? I am trying to encourage you to allow a broader understanding of context to inform your statistical understanding.[[User:Elephantwood|Elephantwood]] ([[User talk:Elephantwood|talk]]) 12:15, 12 June 2016 (UTC)
::No, what you did is fail to quantify the real error, or even model things properly. You have no training in research. That much is obvious. Stop using a firehose of terms you don't understand - I know you're hoping people will give in from the length. You can't combine polls estimating a parameter without tracking the error and correlation over time. Your model is wrong, your approach is wrong. There is no single standard deviation, no matter the declarative statements you make. Try mathematical statistics, measure theory and sigma algebra - hopefully you'll see the problems in your assumptions and go study it all for a few years.
:::No, what you did is fail to quantify the real error, or even model things properly. You have no training in research. That much is obvious. Stop using a firehose of terms you don't understand - I know you're hoping people will give in from the length. You can't combine polls estimating a parameter without tracking the error and correlation over time. Your model is wrong, your approach is wrong. There is no single standard deviation, no matter the declarative statements you make. Try mathematical statistics, measure theory and sigma algebra - hopefully you'll see the problems in your assumptions and go study it all for a few years.


== No support for the statement that most lawyers believe EU membership would be good for the country ==
== No support for the statement that most lawyers believe EU membership would be good for the country ==

Revision as of 13:44, 12 June 2016

WikiProject iconElections and Referendums Start‑class
WikiProject iconThis article is within the scope of WikiProject Elections and Referendums, an ongoing effort to improve the quality of, expand upon and create new articles relating to elections, electoral reform and other aspects of democratic decision-making. For more information, visit our project page.
StartThis article has been rated as Start-class on Wikipedia's content assessment scale.
Page views for this article over the last 30 days

Detailed traffic statistics

Trendline

Does anyone know how the trendline in the chart is calculated ? It doesn't seem to reflect the data to me, and at least two of the polls have "remain"+"leave" percentages without a "don't know" figure. I can't work out what trendline method it is using Marlarkey (talk) 18:05, 8 March 2016 (UTC)[reply]

@Marlarkey: the trendline is calculated using the geom_smooth R function with standard parameters. The grey shadow represents a 95 % confidence interval. The fitting is calculated by the loess function. The same procedure was used for File:Referendum über die Unabhängigkeit Schottlands Entwicklung der Umfrageergebnisse.svg. You can have a look at File:UK EU referendum polling.svg#Summary for more details. Regards, -- T.seppelt (talk) 18:44, 8 March 2016 (UTC)[reply]
If the 95% confidence interval is so small, then there is something very questionable about your parameters. The interval seems at the moment to be about +/- 1.25%. No bookmaker would offer 19\1 against the Remain percentage, say, being outside such a narrow band. Perhaps the standard parameters to which you refer apply to the measurement of a fixed item?Elephantwood (talk) 23:28, 6 June 2016 (UTC)[reply]

Does the trendline take into account the relative sizes of the polls? Couldn't the sizes of the dots be relative to the size of the poll? Caspar (talk) 09:52, 20 March 2016 (UTC)[reply]

@Nonc: yes, of course. This should be possible. I'm not really an expert in R but I'll have a look at it, --T.seppelt (talk) 06:29, 21 March 2016 (UTC)[reply]

I think we should not have a trend line at all. It is obviously possible for the line to convey different impressions depending what algorithm is used. and the implied assertion that a particular algorithm is appropriate seems to count as an "unpublished idea or argument" within the meaning of this site policy. So we should show only the scatter of points, and if the reader wants to make conclusions about trends then that is their private decision. Unless, that is, we can cite a published analysis that uses some particular algorithm, in which case it is okay for us to generate a similar image if their version is copyrighted. --Money money tickle parsnip (talk) 11:43, 6 April 2016 (UTC)[reply]

It seems sensible to me to keep it. Unless the reader is very used to scatter plots, it is very hard to discern a pattern for a moving trend with many outliers. I do agree though that the algortithm [and especially its provenance] should be explicit in the article and not expect the reader to go and look at the metadata for the svg. Is there an external reference for 'geom-smooth'? --John Maynard Friedman (talk) 13:32, 6 April 2016 (UTC)[reply]
I think it is possible for a trend line to fall within wp:calc - provided that "the result of the calculation is obvious, correct, and a meaningful reflection of the sources" - an average (mean) of the last X polls certainly would seem to fall in that criterion for me. DrArsenal (talk) 22:11, 9 April 2016 (UTC)[reply]
I would also like to keep the trendline. The documentation is available here. No special parameters are used. What you can see is a 95 % confidence interval. Regards, -- T.seppelt (talk) 06:42, 10 April 2016 (UTC)[reply]
I'd like to keep the trendline too, but not the 95% confidence interval. I also think the graph should show only percentages for Remain and Leave, as percentages of people sampled who said they would vote one way or the other. Just look at the spread for the figures for "Undecided" since the beginning of 2016. Is it sensible even to talk of a "trend" in those figures?Elephantwood (talk) 23:40, 6 June 2016 (UTC)[reply]
Over the past six months the Leave and Undecided lines have diverged in perfect symmetry, while Remain has dipped slightly. There is an obvious trend: Undecided voters are backing Leave, confounding expectations they would vote Remain (the "safe option"). Firebrace (talk) 01:17, 7 June 2016 (UTC)[reply]
That's a good point! Elephantwood (talk) 01:22, 7 June 2016 (UTC)[reply]

Creation of a "Lead" column

I'd like to add a "Lead" column to the tables on this page to make them consistent with the tables on Opinion polling for the Scottish independence referendum, 2014. I've coded and tested a program which edits the page for me, and I'm satisfied by the result. However, the change is significant and I would like the input of others before I pull the trigger. You can view how the 2016 results will look here: https://en.wikipedia.org/wiki/User:JDBushby/sandbox . JDBushby (talk) 18:35, 11 May 2016 (UTC)[reply]

The colours are a bit stark and hard on the eye; would the pastel ones be better? Keith Death (talk) 12:26, 12 May 2016 (UTC)[reply]
I changed to the pastels, which do look better. JDBushby (talk) 16:22, 12 May 2016 (UTC)[reply]
Looks good to me. I noticed one funny: clicking on the 'shrink' box [why have one? here?] only shrinks recent polls rather than all polls. --John Maynard Friedman (talk) 23:50, 16 May 2016 (UTC)[reply]
Another thing to look at is that if you order by "Lead", it groups together all the 1%s, 2%s etc. It might be better to use a data-sort-value attribute with negative values for either Leave or Remain leads so that the lead order makes sense. Otherwise, looks good! Keith Death (talk) 18:06, 25 May 2016 (UTC)[reply]
I'm in agreement that this should be used. Is there any reason it hasn't yet been implemented?86.0.123.122 (talk) 23:03, 3 June 2016 (UTC)[reply]
I think the table is cluttered enough already and a lead column would not add much to our understanding of the polls. Firebrace (talk) 23:19, 3 June 2016 (UTC)[reply]
It would give a clearer view of which side was pulling ahead. It's eminently doable. We do it for all the other polling.86.0.123.122 (talk) 16:14, 6 June 2016 (UTC)[reply]
We now have an average-of-last-eight-polls section which I think is more informative than a lead column would be. Firebrace (talk) 17:11, 6 June 2016 (UTC)[reply]
I think the average of the last eight polls is WP:Original Research and should be deleted. It fails the test in WP:CALC because it is not obvious. Why the last eight polls, not seven or nine or any other number? Is a simple arithmetic mean appropriate, or should it be weighted by sample size or recency? There is no objective answer to these questions. --Wavehunter (talk) 18:10, 6 June 2016 (UTC)[reply]
I wasn't sure about that either. I have changed it to the last six polls and added a reliable source. Firebrace (talk) 18:49, 6 June 2016 (UTC)[reply]
Much better! Thank you. --Wavehunter (talk) 18:55, 6 June 2016 (UTC)[reply]
It's always possible to ask questions such as "why this, rather than that?" Why refer to polls going back to 2010 rather than 2009 or 2000? Eight is a power of two. If we use a number such as three or four, the average will be too heavily influenced by difference between the most recent poll and the one that drops out. Ten or twenty would make it not sensitive enough. We have to draw the line somewhere. We don't need a "reliable source" for what is a straightforward calculation from the data included in the article, and NatCen's idea of what the "last six polls" actually are may well diverge from ours. An average of the last eight polls on our list (to which I will change the section back) does not contravene either WP:Original Research or WP:CALC. The latter states that "The recursive use of routine calculations, such as summation, products of sequences, or the calculation of averages, also do not count as original research, when interpreted by the article's reader as a summary of numerical data — i.e. when used for well-known (and consensual) forms of 'numerical synthesis'. In this context, the synthesis of numerical data is not original research by synthesis." Yes, it's an arithmetic mean.Elephantwood (talk) 23:17, 6 June 2016 (UTC)[reply]
There is no consensus that we use an arithmetic mean of the last eight polls (see discussion below). Therefore, it fails WP:CALC. --Wavehunter (talk) 18:47, 7 June 2016 (UTC)[reply]
As the polling average has been removed, may I refer everyone back to the OP and resuggest the lead column be implemented. As is done for all other polling on wikipedia 82.46.156.222 (talk) 02:31, 12 June 2016 (UTC)[reply]

Business leaders, scientists, lawyers in lead

It is not cherry-picking to include these in the lead, as these are the only specific groups that are mentioned in the main body of the article. Absolutelypuremilk (talk) 12:03, 1 June 2016 (UTC)[reply]

The lede should be free from this sort of thing - smacks of expressing POV. Sumorsǣte (talk) 19:02, 6 June 2016 (UTC)[reply]
See WP:LEAD. The lead summarises the article. The polls are close, and professionals think the UK's membership of the EU is beneficial. That covers two of the three main sections of the article, which are standard polling, polling within professional groups, and other opinion polling. I guess we should also mention that the majority of non-British EU citizens would prefer it if the UK remained... Firebrace (talk) 20:06, 6 June 2016 (UTC)[reply]
Business leaders talk for the business sector; lawyers talk for their clients or possible clients; scientists do science. They are not "professionals" at talking for the country. The people who are, much as they are held in contempt by many, are the elected politicians, who are of course divided. The article should not convey the message that "professionals at talking about what's best for the country are overwhelmingly agreed that EU membership is good for the country", because that isn't true. And as for non-British EU citizens, the only polls I've read about asked whether they think Britain "should" stay in the EU. Are you sure that question meant would they prefer it?Elephantwood (talk) 23:55, 6 June 2016 (UTC)[reply]
  • Polls of business leaders, economists, etc. have been all over the news for weeks; if reliable sources are covering it, then we should too.
  • Ashcroft question: Would you prefer to see the UK remain a member of the EU, or would you prefer the UK to leave, or does it not matter to you either way? [1]
  • The Guardian article on the ICM poll uses the words favour and prefer. [2]
  • This article is averaging 15,000 views per day – nothing compared to the tens of millions watching the news and reading biased newspapers. If the article is biased, and I don't believe it is, then it will make no difference to the outcome of the referendum... Firebrace (talk) 00:19, 7 June 2016 (UTC)[reply]
I don't know what words the University of Edinburgh survey used in all the different languages, but in English translation they use the word "should".Elephantwood (talk) 01:37, 7 June 2016 (UTC)[reply]
Even if so, it should still not be biased. I don't think it is especially biased. But when an organisation such as the Confederation of British Industry, the body of industrial employers, gets involved in politics, we should surely take what they say as being on behalf of their members, who are a minority interest group and who are not and do not claim to be representative of the country as a whole. They are not qualified to express a view on what's good for the country any more than, say, lorry drivers or unemployed people are. Some of those 15000 views are going to be by journalists. Many of the lazier ones use Wikipedia as their main "research" source, and given that Remain and Leave are running more or less equally well in the polls, the article could have an appreciable effect. But anyway, it's quite good as it stands, and it's good that there are some sensible people contributing to it.Elephantwood (talk) 01:35, 7 June 2016 (UTC)[reply]

Two points. Including the mention of the 3 "professional" groups in the lead section is fine, but there is no mention of pro-brexit groups. This is surely biased, as the general opinion polls at the moment put leave/remain at level pegging. Where are the other professions? Cherry-picking these three groups to represent professionals is wrong; they all go for remain. Firebrace, wikipedia articles should be neutral whether they are popular or not. By your logic, rarely accessed articles could have bias, whereas popular articles would have to be stricter. Besides, where would you define the cut-off point for such twisted logic? ArticulateSlug99 (talk) 01:36, 7 June 2016 (UTC)[reply]

Of course Wikipedia should be neutral, but there is no objective way of measuring bias, so we can never really be sure if the article is biased or not. All we have to go on are people's feelings. Some people think it is biased, and some don't. I'm saying that if the article actually is biased then at least it will have no effect on the outcome of the referendum. What are the pro-Brexit groups? If you can find some, add them... Firebrace (talk) 12:32, 7 June 2016 (UTC)[reply]
The third and fourth sentences make the claim that "There is a contrast between opinion polls of voters in general, which tend to be close, and polls of business leaders, lawyers, and scientists. In all three groups, clear majorities see membership of the EU as beneficial to the UK." This rather oddly privileges three particular professions over a great many others. Why not include the views of say, economists, medical professionals and entrepreneurs? -or law enforcement professionals, farmers and retailers? Bricology (talk) 02:50, 11 June 2016 (UTC)[reply]

3 June Opinium Poll

There appear to be two versions of the 3 June Opinium poll. One Remain 43% vs Leave 41%, currently used in this Wiki page, and supported by tables published by Opinium, and another Remain 40% vs Leave 43% published by the Observer/Guardian. The explanation seems to be a change in weighting used by Opinium, with the newspaper choosing to use the old methodology for their article, and the pollster using the new one for their tables. 82.1.16.12 (talk) 13:42, 5 June 2016 (UTC)— Preceding unsigned comment added by 82.1.16.12 (talk) 07:41, 5 June 2016 (UTC)[reply]

Indeed you are correct. I would say it is up to the pollster to choose what methodology should be used for their headline percentages. In this case Opinium were trying to point out that the new methodology (not alternative as it currently says in the article) masks a larger swing towards Leave by calculating the headline numbers on the old methodology. Presumibly had this methodology been used in the previous poll, the comparison in this one wouldn't have been required. I think the numbers using the old methodology should be removed, and the note changed to say there were methodological changes. Andymmutalk 19:52, 5 June 2016 (UTC)[reply]
Headline percentages are not data; they are similar to predictions based on the data. The pollster can, and in this case did, issue a document containing headline percentages based on a new weighting methodology, but I don't think we should give that more weight than we give to the poll commissioner's headline percentages, which were based on the existing methodology. The "new" methodology here is an "alternative" one. That does not mean it is less valid. But it is not more valid either. To posit a difference in validity would be to adopt a non-neutral point of view, so I support reporting both sets of percentages as we are doing at the moment. Elephantwood (talk) 14:08, 6 June 2016 (UTC)[reply]

Red-green color blind accessibility

There is an accessibility issue with using red and green in the shades currently used. Is there any way to link to a patterned version of the chart rather than depending on color as the distinguishing trait? I am familiar with r graphing techniques and given access to the data could generate the above myself, and provide a version to link. Specifically I would change the shape of each of the respective variables; see ggplot documentation for geom point. This sets by color (as argued previously), size (as suggested previously), and shape. Jitter is another fix for the cases where data is subject to overplotting. Militärwissenschaften (talk) 10:57, 12 June 2016 (UTC)[reply]

Looking at the chart in Photoshop using the red/green color blindness simulator, 'Remain' appears yellow and 'Leave' appears brown. There does not seem to be any issue for people with protanopia or deuteranopia... Firebrace (talk) 14:31, 6 June 2016 (UTC)[reply]

8 Poll Average

I would argue that a simple average of polls is a violation of the Valid Routine Calculations rule.

While the calculation method is simple the data being averaged is not directly comparable thus making any simple calculation either meaningless, misleading, or both.

I can illustrates as follows (at time of writing averages are 49.0/51.0 remain/leave;

1) Publishing date: Is a poll conducted on 24th May more or less recent than a poll conducted between 20-25 May? The alternative is to go by publishing date but as some polls (such as TNS) take longer to publish than others the data will be out of date before it is included.

2) Sample size: If we are performing a simple average then polls that sample more people should be given more weight. If we do this the simple average becomes 48.8/51.2.

3) Decimal Places: If we go into each poll and extract the real numbers rather than the published figure which is usually rounded to the nearest whole number the simple average becomes 49.6/50.4

4) Methodologies: This week we have a great example of a polling company that has recently changed methodology. They only published one official figure but the newspaper that commissioned the report published a different figure based on their previous methodology. If we just include the published methodology the simple average becomes 49.2/50.8, if we go by the original methodology then it is 48.9/51.1.

5) Don't knows/Undecideds: Each polling company published figures with different approaches to people not giving an answer. For example Yougov exclude those certain not to vote. ICM includes people certain to vote, ORB only includes those very likely to vote. As a result if you are undecided about whether you will vote or not then you are an Yougov undecided but not an ICM undecided. Apologies if I have misinterpreted the polling approaches, but this is just one of many methodological differences between the polling houses. This is reflected in the fact that Yougov figures do not add to 100% whereas others do. This means that we cannot make a simple average without making a judgment call.

As an alternative I would either recommend publishing poll averages from other sources such as Poll-of-Polls or similar. The difficulty there is that each one is different for all the reasons outlined above.

Another alternative would be to use something similar to what was on the UK General Election page leading up to the vote last year where there was a table of forecasts, predictions and poll averages.

In any case, while I see the benefit of providing some measure of current opinion that evens out the kinks of several different polls I think that a supposedly simple measure that could give Remain a figure of 49.0, 48.8, 49.6, 49.2, 48.9 (or indeed many others), is ambiguous, and neither obvious, or a meaningful reflection of the sources. For that reason I think it does not constitute a routine calculation and should be removed.

What do you think?

Mykums (talk) 15:07, 7 June 2016 (UTC)[reply]

Yes, I agree. There are simply too many differences in the methodology for us to do this in any objective way. It is simpler to simply remove it and let people see the individual polls themselves. Absolutelypuremilk (talk) 15:38, 7 June 2016 (UTC)[reply]
I also have to agree. We don't know which of several possible methods User:Elephantwood is using to calculate the averages; it therefore fails WP:V because the data cannot be verified, and given that reliable sources are all using different methods, using one method in particular is WP:UNDUE... Firebrace (talk) 16:27, 7 June 2016 (UTC)[reply]
As I have argued above, I think it should be removed. --Wavehunter (talk) 18:47, 7 June 2016 (UTC)[reply]
Even more relevantly, we have a reliable third party source (* EU referendum poll of polls  – What UK Thinks: EU) that does it for us, so we have no excuse for inventing our own. Per wp:nor and the discussion above, I am deleting that section. --John Maynard Friedman (talk) 20:50, 7 June 2016 (UTC)[reply]
John Maynard Friedman's argument doesn't stand up. Of course percentages are comparable, and of course the average is not "meaningless". Date: order by the time the last data point was collected. Sample size and methodologies: ignore. The point that the percentages are comparable overwhelms these points which could be used to undermine many averages. The point of an average here is precisely to smooth over the different influences on each of the data points. That's usually what an average is for. Did you realise that the trend line shown on the graph at the top of the article expresses an average? Why don't you argue against that? Your argument doesn't support your proposition, and you've failed to argue it from the Valid Routine Calculations "rule". Perhaps if you didn't insist in such an absolutist way on uncomparability and meaninglessness you might be able to construct a case, but you haven't presented a robust one so far. But since you have removed the section without even waiting to hear any oppositional response to what you wrote, even after I specifically invited people to discuss the issue here on the talk page, I am not going to add it back and take part in an edit war. This is only Wikipedia, and it's only a referendum on British EU membership. Elephantwood (talk) 21:29, 7 June 2016 (UTC)[reply]
WP:NOR doesn't apply to images. Also, two reliable sources, The Telegraph and What UK Thinks: EU, are both showing Remain at 51% and Leave at 49%, while your table was showing them at 49.8% and 50.2% respectively... Firebrace (talk) 22:03, 7 June 2016 (UTC)[reply]
Of course WP:NOR applies to images. I have no idea why you refer to "reliable sources", as if there's a question about what sources are or are not reliable for a statement. My preference is for an arithmetic mean of the percentages - of those who expressed an intention to vote - that are included in the most recent eight entries in the table in this article. There is no reliability issue. @Firebrace, so the average of the last eight polls thought worthy of inclusion here is different from some average of polls calculated some different way by someone else - so what? What I want is a summary of what's here, to give people a good handle on it. It would be well within WP policy and practice. But if the speciousness problem is raising its head, as it often does on WP, then...well it's only Wikipedia. The current "Polls of polls" section is ridiculous, in the way that in a single column it lists figures for Leave and Remain that are percentages of those who have said they will vote, together with figures that are percentages of those people plus don't knows, won't votes, won't says, or whatever. Columnisation suggests a comparability that isn't present. This is a shame, because data that is comparable is included below in this very article. But that's all from me. WP has low standards - what else is new? So now you get four "averages" showing Remain ahead, and one showing Leave ahead, whereas the data below in this very article suggests that Leave is ahead. But it's required here to assume good faith, yes?
As for WP:NOR and images, would you be OK with a graph showing the average of the last eight polls in the list, as percentages of those who expressed an intention to vote? Just not text, right. And you're only saying what's in the rules, right? ROFLElephantwood (talk) 16:59, 8 June 2016 (UTC)[reply]
I think Elephantwood (talk) does make a valid point about the purpose of averages to reduce the significance of individual house effects. I am not comprehensively opposed to the idea of averaging polls. However, as I'm sure we all appreciate, it is such an emotive subject that we should try and remove any possibility of bias, and without an agreed methodology there will always be that possibility. In a campaign that has seen so many people desire "hard facts" clear of political rhetoric, it is perhaps an "absolutist" view of factual representation that should win through. Finally to the point raised about the excellent trend graph provided by T.seppelt (talk). I think that so long as the actual data points are included and the trendline is gradual then what it shows is the long term movement. This longer term view will achieve the smoothing out of different house effects that an average should achieve. It also addresses another point (one I forgot in my earlier post), i.e. why 8? Obviously the fewer you take the more up-to-date the figure is, the more you take the less susceptible it is to anomalies. However with eight you may have, for example, 0, 1, 2 or 3 polls from Yougov. This means that an 8 poll average would be influenced by the timings of polling houses releases as much as by the actual shifting of public opinion. With the long term trend line this further house effect is mitigated. Mykums (talk) 07:50, 8 June 2016 (UTC)[reply]
"so long as the actual data points are included and the trendline is gradual then what it shows is the long term movement. This longer term view will achieve the smoothing out of different house effects that an average should achieve." May I suggest that you find out the meaning of the term "moving average". The graph shows a moving average, as does the section that has now been removed. The one used in the graph may perhaps be an exponential one. Whatever kind of moving average it is, one or more constants will have been chosen that could have been chosen differently, so your argument simply doesn't work. If you do not accept that point, then you are misunderstanding averaging. The graph also shows an extremely tight 95% confidence interval which is completely ridiculous, given the spread of poll data, given recent polling history, and given that we are not dealing with measurements of a (fixed or even changing) physical quantity. That said, I would agree that the graph looks nice.Elephantwood (talk) 18:36, 11 June 2016 (UTC)[reply]
Look at the table and go down it, writing down the leads for Leave or Remain and ignoring the poor comparability (easily surmountable) that comes from the inclusion of DK/WV/WS in some rows but not others:
10% Leave lead, 1% Remain lead, 5% Leave lead, 12% Remain lead, 4% Leave lead, 2% Remain lead, 3% Leave lead, level pegging, 2% Leave lead, 3% Leave lead
Thats 6-3 for Leave with one draw, and an average (arithmetic mean) of a 1% Leave lead. But the stupid section on "polls of polls" gives
2.6% Remain lead, level pegging, 1% Remain lead, 2% Remain lead, 1% Leave lead, 2% Remain lead
That's 4-1 for Remain with one draw, and an average of a 1.1% Remain lead.
The bias is obvious. Leave are clearly ahead in the polls, and this page is saying that Remain are ahead.Elephantwood (talk) 12:27, 11 June 2016 (UTC)[reply]
  • I've added some polls of polls from reliable sources. These should be more trustable than a crude rolling average. Smurrayinchester 15:53, 8 June 2016 (UTC)[reply]

ORB polling

As is currently being discussed through edit summaries, there is some confusion over what is the headline result of the ORB polling. For example, in the last poll, the headline result was significantly different to what was reported in the Telegraph (which looked at only those certain to vote). As ORB seem to have previously published the latter, this makes for an easier comparison with other ORB polls, but difficult to compare with other polls. Thoughts? Absolutelypuremilk (talk) 13:02, 8 June 2016 (UTC)[reply]

Per WP:WPNOTRS, we should use the results published by secondary sources... Firebrace (talk) 14:30, 8 June 2016 (UTC)[reply]
Also note that, per WP:V, material does not have to be attributed to a secondary source, just "attributable"... Firebrace (talk) 15:20, 8 June 2016 (UTC)[reply]
If "we should use the results published by secondary sources" should we then reference the secondary source (e.g. The Telegraph) rather than the primary source (e.g. ORB Polling). Even leaving aside the question of whether any UK newspaper in this regard meets the WP:WPNOTRS criterion of "reliable secondary source", this hardly seems desirable in this context. Pseudoneiros (talk) 15:32, 8 June 2016 (UTC)[reply]
No, see above. Firebrace (talk) 15:38, 8 June 2016 (UTC)[reply]
Number Cruncher Politics says that ORB has clarified that the "all voters" poll is their headline, although I agree that it's a bit of a problem to simply change methodologies halfway through the table. Perhaps it would be best to split them, like we did for the Observer poll (like the Telegraph, the Observer only used the numbers from the old methodology). Smurrayinchester 15:57, 8 June 2016 (UTC)[reply]

Article title

Can the year 2016 please be added to the title of the article please has the referendum is taking place on June 23 2016. 46.65.75.27 (talk) 13:04, 8 June 2016 (UTC)[reply]

Why? There has only been one referendum on the UK's membership of the EU. The 1975 referendum concerned the UK's membership of the Common Market. If there had been two referendums, I would agree to changing the title, but there is no reason to disambiguate. Firebrace (talk) 14:13, 8 June 2016 (UTC)[reply]

Andrew Neil Interviews

You state TV interviews on Sky, ITV but what about the Andrew Neil 1-1 interviews on BBC 1 - this is peak viewing time and would get far higher audiences than Sky. (Coachtripfan (talk) 11:20, 9 June 2016 (UTC))[reply]

ORB Weighting

Is it worth noting that the latest ORB poll giving Leave a ten point lead gives them only a 6 point lead (53% to 47%) when weighting on likelihood to vote is not applied? 82.46.156.222 (talk) 19:44, 10 June 2016 (UTC)[reply]

Yes, yet again the ORB figures have been entered incorrectly on the basis of a newspaper report rather than the poll itself (see TALK above). ORB is clear that its headline figure in 47/53 (not turnout-adjusted 45/55). In comparison to previous ORB polls listed, the table is currently not comparing like with like.Pseudoneiros (talk) 09:56, 11 June 2016 (UTC)[reply]
Did it ever compare like with like? To be serious, it would show predicted percentages for Remain and Leave expressed as percentages of those who who say they will vote - because the actual percentages on the day will be such percentages, and it's those percentages that these figures are predicting. They should be weighted more finely according to expressed likelihood of voting, or other tweaking, if that's what's being widely reported. But there's no way the bad faith people here will stand for that.Elephantwood (talk) 11:37, 11 June 2016 (UTC)[reply]
That's fine – thanks. Previously an anonymous user had simply reversed the 45–55 figures, which was the reason for my change. Ondewelle (talk) 11:14, 11 June 2016 (UTC)[reply]
There is so much speciousness and bad faith here that it's hard to want to take part. ORB don't do "headlines". They are a polling company. They report to who commissioned the poll. The headlines all give the weighted figure: Reuters, the Daily Mail, the Independent, etc. Weighting the results is actually similar to weighting the sample. So why not go and unweight all the samples, or just delete the page because that would be "unobjective"? There is no objective probability of what the result will be. The referendum is not a race. Percentages based on polls are predictions, even if they are totally unweighted. Obviously the weighted figures, giving a 10% lead for Leave, should be reported here. Making your own prediction based on de-weighting the samples is about as subjective as subjective can be. There are those who argue otherwise because they are biased; there may possibly be some who argue otherwise because they do not understand. This page is getting to be a joke. But still, it's only Wikipedia, so no problem; let it.Elephantwood (talk) 11:24, 11 June 2016 (UTC)[reply]
Pseudoneiros, could you possibly link to where ORB say that? Thanks Absolutelypuremilk (talk) 11:44, 11 June 2016 (UTC)[reply]
The "headline figure" (pollsters' terminology) in ORB polls is used by all professional pollsters (e.g. http://www.ncpolitics.uk/uk-eu-referendum/). Also Kellner, Curtice etc. For ORB clarification of its poll figures see also the comment above by Smurrayinchester under ORB Polling. What newspapers choose to report is irrelevant. If, however, the consensus here is to use the predicted-turnout-weighted figure on ORB polls, then that's perfectly fine with me. But, in that case, all the ORB polls must be be presented as so weighted, not just some of them. Without consistency, the table compares apples with oranges.Pseudoneiros (talk) 12:05, 11 June 2016 (UTC)[reply]

Colour Bias

I would argue that green for remain and red for leave is somewhat biased and isn't very helpful with impartiality. Green usually represents positivity and red represents negativity. Should these be changed? (86.17.120.75 (talk) 02:03, 11 June 2016 (UTC))[reply]

Yes: green is soothing and easier to look at for longer periods (that's why they use it in hospitals); red represents anger and discordance and is harder to look at for long periods. I imagine the colours were chosen for precisely that reason: Remain bias. You won't be able to persuade people to want to change them, though. They will argue the hind legs off a donkey to keep them as they are, or if they change them, to keep them as bad or make them worse. I'd advise against wasting your time. I gave up when someone kept a straight face and said the average of the last eight polls here, which is very easy to work out, shouldn't be included, and then when I pointed out that an average (together with an utterly ridiculous version of a 95% confidence interval) is displayed on the graph at the top of the page, he said the rule against "original research" didn't apply to "images". Best not to argue with such people, my friend.Elephantwood (talk) 11:24, 11 June 2016 (UTC)[reply]
Actually, I think they were just chosen by copying the tables and colour scheme used in the Scottish independence referendum article (where there was a yes/no answer - at the time this article was created, it was believed that it would also be a yes/no question, and it was only later changed to leave/remain) rather than a deliberate scheme to bias the election. I wouldn't object to changing it, although there's nothing especially obvious that presents itself. Smurrayinchester 13:22, 11 June 2016 (UTC)[reply]

"Life experience"

Someone keeps adding a sentence that older people are more likely to vote for Brexit due to their life experience. This is just an opinion so please leave it out. Otherwise, I can add my own opinions, e.g. that older people are also more likely to be senile, or that less educated people (who are more likely to vote for Brexit) are likely to be less intelligent. — Preceding unsigned comment added by Zurich gnome (talkcontribs) 14:10, 11 June 2016 (UTC)[reply]

Yes, I agree, it has no place here. Absolutelypuremilk (talk) 16:23, 11 June 2016 (UTC)[reply]

10 June Opinium Poll

Someone has just added the latest Opinium poll. I see a few problems with it. Poll was published on 10.06 but was actually conducted earlier so it should further down the table based on when it was conducted. Now I am going to the Opinium page to find the correct dates, and they are all over the place:

  1. On the web page (http://ourinsight.opinium.co.uk/survey-results/political-polling-7th-june-2016) it says Opinium Research carried out an online survey of 2,009 UK adults aged 18+ from 7th to 10th June 2016 to 3rd June. What does this 3rd June mean?
  2. On the PDF with the detail results it states: FIELD DATES | 9 to 12 January 2016 Huh??

I think Opinium has really messed this up. — Preceding unsigned comment added by Zurich gnome (talkcontribs) 18:11, 11 June 2016 (UTC)[reply]

Average of last eight polls listed in this article

As a yardstick against which interested editors and potential editors can measure the bias present in the current "Polls of polls" section, here are the current average percentages that Remain and Leave have attained in the eight most recent polls listed in the table, expressed as proportions of those who intend to vote (obviously the most reasonable basis for comparison) (note that the Opinium poll conducted between 31 May and 3 June has been counted just once, with equal weighting given to both methodologies):

Dates conducted Remain Leave
30 May–9 June 49.0% 51.0%

The standard deviation of the Remain results, which is obviously the same as for the Leave results, is 2.0%. Multiplying by 1.96 gives 3.9% and a 95% confidence interval for Remain of 49.0% ± 3.9%; in other words, 45.1% to 52.9%; or, rounding, 45-53% for Remain and 47-55% for Leave. (If anybody thinks this kind of calculation shouldn't inform any part of the article, note that it does inform the graph, although the graph shows a confidence interval that is ridiculously narrow.)

Doing the same for the polls of polls listed in the article gives intervals of 49.2% to 52.0% for Remain and 48.0% to 50.8% for Leave; or, rounding, 49-52% for Remain and 48-51% for Leave.

In summary, the polls indicate great uncertainty as to which side will win. They also indicate that the winning margin will with more than 95% probability ("more than" because the mean in neither case is 50%) be between 0% and 10% (my calculations) or 0% and 4% ("group wisdom at Wikipedia's" calculations). Elephantwood (talk) 01:33, 12 June 2016 (UTC)[reply]

What is your statistical background? You have a vague reference to standard deviation but neglect the impact of sample size on its estimation among the grouped polls, insist on an average being the best measure, and use only the most basic placeholder for normal distribution CDF. I think you have a bias you are favoring rather than making an argument from statistical reasoning. This unfortunately is such a charged issue even basic visualization issues are ignored, but your points are hand waving with poorly constructed "equations" that are extremely (overly) simplified. First issue is that sampling methods are in fact crucial to combining the data, each poll you've averaged has it's own independent standard deviation, and combining them requires use of significantly more complex equations for pooled variance. Militärwissenschaften (talk) 10:58, 12 June 2016 (UTC)[reply]
What I did is avoid false accuracy. It's called good engineering. Take sample size into account and use a sensible attitude to accuracy as I have done, and you will reject the "one side is more likely to win" hypothesis, as I have done, and you will come up with a 95% confidence interval for the winning margin of somewhere around 0%-10% as I have done. Why don't you actually do it and find out? One thing you don't mention is that I assumed normality. That too is to avoid false accuracy. There are no "visualisation" issues; I don't know who you are quoting when you put the word "equations" in inverted commas; "its" as a possessive does not have an apostrophe; and there is a big issue with your thinking that each poll has its own standard deviation, which comes from your ignoring that polls do not measure a physical entity, whether fixed or even variable, despite the impression that they do that's given in the media. I'd examine your use of the word "requires" in your final sentence if I were you. Ascription of "over-simplification" is not sensible here. Please use the higher level of accuracy and complexity that you clearly think is "required", and consider what accuracy is reasonable to use when stating your prediction and your 95% confidence interval, and post your own results for comparison. Thanks. PS If you end up thinking that the 95% confidence interval is only about 2.5% wide, as is implied by the shadow band on the graph at the top of the article, perhaps we can each stand by our analyses and have a wager? I would be happy to place a bet at 19\1 or even 15\1 against the result being in any confidence interval you care to mention that is only 2.5% wide: in other words, I win 15 units if it's outside such an interval, and you win 1 unit if it's inside. If you really believe that the probability is 95% that it will be inside such a narrow interval, then of course you should also believe that your expected gain from the bet would be positive. To summarise: I'm saying the polls at the moment don't suggest that either side is more likely to win than the other, and I'm saying they suggest there's 95% probability that the winning margin will be less than 10% and 5% probability that it will be more than 10%. Let's have your figures please, based on what you clearly think is the requirement to apply a more complex analysis than the one I have applied.Elephantwood (talk) 12:15, 12 June 2016 (UTC)[reply]
Last point: the sizes of the polls listed in the table conducted this month and in May range from 800 to 3405. The median size is 1703. You assume the correlation between poll size and the accuracy of predictions made on the basis of poll results is sufficiently significant to "require" to be taken into account if one is to make sensible predictions on the basis of a sizeable collection of polls. Do you seriously think that a hypothesis such as that "sample sizes between 800 and 1703 yield more accurate predictions for referendum results than sizes between 1704 and 3405" should be accepted? With how much confidence? Please consider the concept of false accuracy. Bear in mind that polling companies that have carried out polls for this referendum are currently concentrating their public relations spending far more on extolling sampling methodology and post-poll weighting methodology than on stressing how big a sample size they can supply. Why? I am trying to encourage you to allow a broader understanding of context to inform your statistical understanding.Elephantwood (talk) 12:15, 12 June 2016 (UTC)[reply]
No, what you did is fail to quantify the real error, or even model things properly. You have no training in research. That much is obvious. Stop using a firehose of terms you don't understand - I know you're hoping people will give in from the length. You can't combine polls estimating a parameter without tracking the error and correlation over time. Your model is wrong, your approach is wrong. There is no single standard deviation, no matter the declarative statements you make. Try mathematical statistics, measure theory and sigma algebra - hopefully you'll see the problems in your assumptions and go study it all for a few years.

No support for the statement that most lawyers believe EU membership would be good for the country

At the moment the article says that "In all three groups [business leaders, lawyers and scientists], clear majorities see membership of the EU as beneficial to the UK." But the source cited . only suggests that a single poll has indicated that a majority of lawyers think EU membership would be beneficial for their own firms and for the City of London's position in financial markets. That is not at all the same as them saying they think membership would be good for the country. I will change the second paragraph to make this clear. I also wonder why a solicitor in Newcastle whose work is mostly conveyancing, together with some remortgaging, advice on making wills, and defending or prosecuting alleged shoplifters, would be likely to have more basis for his opinion on the importance of EU membership to the position of the City of London in global financial markets than a member of the British population chosen at random.Elephantwood (talk) 12:29, 12 June 2016 (UTC)[reply]