Talk:Nationwide opinion polling for the 2020 Democratic Party presidential primaries/Archive 1

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

Archive 2

Archive 3

National polling

The "National Polling" table seems to be inaccurate, at the very least in the top two lines, which are currently Morning Consult/Politico polls. I noticed that the Biden and Bernie percentages seemed incorrect, but many more things seem wrong: The sample size is close to 2,000 -- clearly noted at the top of each crosstab document that the wiki page links to -- and also, for instance, Hillary Clinton was not even a response option in the Jan 4-6 poll yet she has "12%". Can this table be cleaned up? 38.134.125.24 (talk) 20:25, 21 January 2019 (UTC)

That's not in error: these numbers are only among Democrats only, as is consistent with the first Morning Consult/Politico poll only. The text of the file is cut off and as such there are candidates who were asked but aren't visible on the page; I extracted the results for any candidates off of the page, which are shown on this article (if you ctrl+F "Hickenlooper", for example, you'll find that his name is in the document but off-screen). (The full list of the columns in the January 4-6 poll is as follows: Kerry, Biden, Sanders, Warren, O'Rourke, Booker, Harris, Brown, Klobuchar, Gillibrand, Holder, Cuomo, Delaney, Castro, Inslee, Garcetti, Bullock, Schultz, Bloomberg, Hickenlooper, Clinton – with a hyphenated line break – McAuliffe, Other, Don't Know / No Opinion, Total N.) Note that this question was asked of all respondents, which includes all self-identified Republicans. The results which were circulated by Politico for their November 2018 poll were among only self-identified Democrats, and while they haven't since published an article on their primary polling, the question continues to be asked and the results buried in their crosstab files. The sample size you're referring to, then, is that among all respondents, not just Democrats. Mélencron (talk) 21:10, 21 January 2019 (UTC)

Eric Holder, others

I've moved the results for Eric Holder from a separate column into "Others", not for not meeting the five-poll threshold, but for consistently low and no polling results. That's what the Others column is for and we don't need the table getting too big. Otherwise it would be a very low inclusion criteria where simply being polled is enough, and that doesn't happen anywhere else on Wikipedia. Onetwothreeip (talk) 20:29, 24 January 2019 (UTC) @Mélencron: You've reverted this more than once, are you going to discuss this or not? Onetwothreeip (talk) 02:14, 25 January 2019 (UTC)

First of all, I'm literally the user who created this article, and the phrasing of the "other" column reflects that. Second of all, "too big" is literally beside the point, as different browsers automatically render the page differently, and all wikitables are too large to display on mobile devices anyway and are automatically scrollable; it's like complaining that a list of people on Wikipedia is "too long" – it's called being comprehensive, which is pretty much exactly the point. Third of all, I disagree that there can be a clear distinction made between most lower-polling candidates, especially if they're polling in a fairly similar range in general – between Holder and Kerry, you're removing candidates who are averaging 1% and above, which should be visibly a fairly clear delineation between candidates who just aren't being included in many polls (Holder and Kerry each in about half of all rows) and others who aren't being included in most polls or polling above that range. Don't forget that sampling error can be fairly substantial and "differences" between the levels of candidates can be white noise down to methodological choices or just pure chance alone. If Holder and Kerry were 0% in half of all polls, I'd agree, but that's just not the case. (Furthermore, I'll also note that the usage of the "others" column like this in downballot articles for 2018 primaries – which I also maintained – were essentially also completely comprehensive and almost always listed all candidates polled except for those polling at extremely low levels, e.g. MD-Gov Dem 2018 – collapsing to the "others" column, I argue, is just not informative to readers, which is why I'm also resistant to moves to use citation templates, since that's just not how ordinary readers use Wikipedia). Mélencron (talk) 02:23, 25 January 2019 (UTC)

What's the relevance of you starting this article? You don't own it at all. I'm making the article better but you're reverting those edits for reasons you haven't explained. Including candidates polling at 1% is an extraordinarily low inclusion criteria which is totally unnecessary when we have so many candidates polling at non-trivial amounts. Seriously, 1% in a poll of 1000 people is only ten people. The WP:OTHERSTUFFEXISTS argument that you're making isn't normally a strong one, but it's even more questionable when you're admitting that you're the one who created these examples that you're saying we should follow. I've had a look at the Democratic Party primary for the Maryland governor election that you talk about, and even the lowest polling candidate reached 5% in the polls, and even then you collapsed several candidates into "Others". You simply haven't given a reason why we should follow your previous examples, and it's not even clear how this article is following the other articles you've worked on either. Onetwothreeip (talk) 02:47, 25 January 2019 (UTC)

Alright, if you're literally going to ignore what I wrote, be like that, but fine – I'm talking about averaging at least 1% and being included in a substantial number of polls, which is a clear delimiter as to the number of candidates (that alone keeps it at the current 14 candidates, which again, is absolutely normal in terms of what you'll see on Wikipedia – the 2016 Republican article goes up to 17 candidates at one point who were included in most polls, and to invoke the example of other polling articles I've maintained outside of the U.S., the Dutch article requires 13 columns for parties, 16 notable candidates ran in the 2002 French presidential election, etc.) – and I'm surprised that you believe that this is an unreasonable number of candidates. If you want a higher threshold, fine, but that's impossible to enforce and that's part of what I've argued here – the candidates are polling at 0.89%, 1.60%, 1.73%, 2.09%, 2.29%, 2.50%, 3.47%, 3.50%... you're arguing over utterly microscopic margin-of-error-even-within-the-aggregate differences, and I see no reason to be as comprehensive as possible in vein of the previous articles. (There are only six clearly high-polling candidates: Biden, Sanders, Clinton where she's included, O'Rourke, Warren, and Harris – anything below that and I'd argue it's a POV issue to selectively exclude them since these aren't statistically significant differences.)

I'm perfectly aware of WP:OTHERSTUFFEXISTS, but you're clearly arguing against existing precedent and past consensus in arguing that these lists be less comprehensive than they currently are simply based on some arbitrary thresholds which are ultimately pointless and make it harder to maintain. If you want to make it such that candidates suddenly get removed from the tables and added back a week later as soon as one or two slightly better polls for so-and-so comes in, be my guest, but you're arguing for an unmaintainable and untenable standard which ultimately wasn't used in the past for that exact reason. Mélencron (talk) 03:01, 25 January 2019 (UTC)

So let's have an average of higher than 1%, since that's clearly too low for inclusion criteria. It might make sense if there were only a few candidates at that level, but there are clearly many who do. You've admitted yourself that the "precedent" is tables that you've personally created. Overwhelmingly the polling articles on Wikipedia do not have nearly this many candidates in one table.

The table for the next Dutch election is fine since all those parties will in parliamentary seats, unlike the subjects of this article. The table for the 2002 French election is too much, and so are the largest tables for the 2016 Republican Party primary. You've only chosen the tables with the most candidates, most of which should be reduced in size the same way as this table should be. Obviously most such tables have much fewer candidates. These large tables are only normal on Wikipedia in the sense that sometimes Wikipedia has tables that are too large.

The average polling results as you have listed show that a relatively minor increase in the criteria would be enough to make the table smaller. It's absolutely not a WP:POV issue because it's completely irrelevant to how I may personally feel about the candidates. I've only been removing columns for candidates that aren't going to suddenly be polling better again, Eric Holder and John Kerry. Polling organisations are barely asking respondents about them, if at all. Onetwothreeip (talk) 03:21, 25 January 2019 (UTC)

The core of your disagreement seems to be on the basis that these tables are apparently "too big" and arguing that the table should be reduced "in size" – sorry, but I'm going to have to disagree with you that that's even a problem. If there are many candidates, then we should list them under reasonable criteria. The main article associated with this includes 8 major candidates already with dozens of other speculative candidates. Are we going to remove polls because they'd make the table "too long"? It's a completely arbitrary argument that has nothing to do with anything other than disagreeing with how the table looks.

It isn't even true that these "large tables are only normal on Wikipedia", as poll aggregators outside of Wikipedia clearly also list the vast majority of candidates polling at some significant level, and it's funny that you're arguing that it isn't somehow a POV issue, since you're making the exact argument I've made to other editors in the past in other polling-related RfCs which I ended up "losing" – because it is inherently a POV issue, as regardless of how you feel about the candidates, you're going to end up arbitrarily excluding many of them for little apparent reason other than on the basis of their polling numbers. As an editor on election-related articles, I'm obviously interested in elections and politics and have my own feelings about particular candidates and parties – but that doesn't mean that other individual decisions that have to do with those to whom I'm apathetic can't be subject to POV violations, which what you're suggesting clearly is. You're also making a speculative assumption that Holder and Kerry aren't going to be polled again on the basis of 3 of the more recent polls happening to come from pollsters which regularly omit candidates included by other pollsters. Mélencron (talk) 03:33, 25 January 2019 (UTC)

Actually, strike that, I've changed my mind. Mélencron (talk) 03:41, 25 January 2019 (UTC)

Just to clarify by what I meant with "only normal on Wikipedia", nothing to do with if they are normal on other websites. I said they are only normal on Wikipedia to the extent that it's normal for articles on Wikipedia to have problems. Onetwothreeip (talk) 03:15, 26 January 2019 (UTC)

Criterias for inclusion on national polls table

@Mélencron: @Onetwothreeip:

There seems to be some ambiguity about the level of relevance a candidate has to meets for inclusion in the national polls table on this page. Among the arguments for excluding candidates have been that: (1) they poll too low, (2) they get polled too frequently or (3) they are not seriously running. However, some candidates included in the "others" column currently have more relevance than others with a column on their own. To solve these issues, I think we should set up an objective standard of inclusion for candidates on a column in the table.

I suggest that:

1. All candidate present in at least 60% of polls is immediately included.
2. All candidates polling over 4% in at least one poll be included.
3. Candidates with high name recognition but that formally declined to run (Hillary, Obama) be excluded.

Anyone has a different suggestion?Emass100 (talk) 19:47, 26 January 2019 (UTC)

This is going to be an endless discussion by supporters of some candidates and I don't support any hard or inflexible requirements at this point at time, especially with earlier polls which will be collapsed anyway. In general, I favor taking the approach that's taken on downballot articles (where there's usually far less editing activity), which is to generally list the candidates who are appearing in most polls in general, regardless of the level they're polling. However, as some polls are including 20–25 candidates, and not always the same set of candidates, that isn't necessarily practical.

With the point with regard to Clinton, I disagree to some extent, in the sense that technically all primary polls at this point are "hypothetical", and thus will include candidates who are not running or sometimes still include individuals who have apparently ruled out standing as a candidate. In the case of Clinton, I'm in favor of maintaining her in the table mainly for the sake of clarity in multi-scenario polls and also because of the number of polls she is included in and polling at a significant level for contextual reasons.

I disagree with (2) automatically simply because of the inevitability of outliers (which will be bound to happen) and different numbers/sets of candidates leaving the possibility of producing different results between polls. However, I don't disagree with the general gist of any of these suggestions, and they're already what the current table generally reflects. (Of the candidates not included, Delaney/McAuliffe are included in many polls but polling too low; Michelle Obama is included in only 1 poll in each table; and all other candidates are included in too few polls and also polling too low; rather than setting specific criteria, I'd rather remain flexible as on other articles, as this otherwise leads into a rabbithole of constant edit-warring and I don't want this article to become unmaintainable simply just because it attracts the interest of more editors who otherwise normally wouldn't be involved.) Mélencron (talk) 20:06, 26 January 2019 (UTC)

And as an additional side note: these criteria have different implications depending on where you decide to break off the table; preliminarily I'm fine with the current state of the candidates listed in the table, but note that these are mostly just the same pollsters and this is worth revisiting after maybe another 10 new polls or so are released. Mélencron (talk) 20:12, 26 January 2019 (UTC)

We have enough candidates being polled to easily justify not including all of them. This article is about the polling, not candidates, so it includes all candidates even if they are considered "Others". Using notes in the Others column is actually quite generous since we do not normally do this. A better threshold would be something like having two polls at least 5%. That is a more intuitive number and is often used across on Wikipedia, and more than one should be required to preclude that the highest (and lowest) results for each candidate are usually outliers. There is no reason that simply being in polls justifies having a separate column in this table. It's important to note that we actually are listing every single candidate, it's just that many of them are collapsed into the Others column, and we can easily expand them into a new column if they poll higher. It's actually more impressive if they aren't an announced candidate, since that would mean they have more room to poll higher. Onetwothreeip (talk) 20:58, 26 January 2019 (UTC)

There's two assertions here that are factually incorrect –

Using notes in the Others column is actually quite generous since we do not normally do this – the collapsing of candidates (whether infrequently-polled ones in primaries or third-party candidates in general election tables) is present in many – possibly a majority – of polling tables related to 2018 statewide elections, and

we can easily expand them into a new column if they poll higher – which is why I'm also reluctant to reduce the number of candidates too far, as it is in fact a PITA to have to add and remove candidates from these columns as the number of polls becomes larger, especially when you remember that some of these results are in fact rounded and need to be recalculated manually. I don't want it to be that we'll have to suddenly add and remove the same candidates every other week just because some polls happen to be more or less favorable outside the margin of error, and as I've said before, the standards you've proposed are unworkable. Mélencron (talk) 21:21, 26 January 2019 (UTC)

I can do this pretty quickly so I'll take up the responsibility if nobody else wants to, that's fine. If most statewide elections from 2018 do the same as here, they are also being generous. This is more comparable to national elections anyway. Onetwothreeip (talk) 22:06, 26 January 2019 (UTC)

@Mélencron: You keep reverting my edits based on criteria that only you agree to, saying things like this is how it's done on other polling articles, and it's just getting annoying. It's hard to restore my edits with the way you're doing this and I'd rather not have to keep doing this manually. Onetwothreeip (talk) 22:23, 26 January 2019 (UTC)

Maybe you should consider the fact that it's because you're unilaterally disagreeing with largely-accepted practice not only on here but on other opinion polling articles (I saw your exchange with Impru20 on the 2015 Spanish polling article as well) and that I think that you aren't "making the article better" and my concern isn't with your ability to "restore [your] edits"? Citing "how it's done on other polling articles" is, in my view, entirely the point, not an argument against current practice. Mélencron (talk) 22:29, 26 January 2019 (UTC)

Most of my change on the 2015 Spanish opinion polling article was approved by Impru20, so I don't know what you're talking about. I can say certainly that what I'm doing is conforming the table to be what is the accepted practice for polling articles on Wikipedia. The only examples you've shown otherwise are tables that you have personally created, or other tables that have a lot of candidates listed with their own columns. I'm sure you know that relatively few tables have as many candidates as that, and they are almost entirely for proportional representation elections. Nowhere is it considered normal to be including candidates in their own column if they consistently poll something like 0%. Onetwothreeip (talk) 22:38, 26 January 2019 (UTC)

You know as well as I do that none of the candidates currently in the table are "consistently poll[ing at] something like 0%", so I don't know why you're making that claim that that's what I'm doing here. There isn't any disagreement here on that the tables should be reduced to higher-polling candidates using the "others" column, I just disagree with your application of it. Let me just be more explicit in asking: who in the current national table do you believe shouldn't/should be there, and why? Mélencron (talk) 22:41, 26 January 2019 (UTC)

I appreciate that you have put a lot of work into these, but you have WP:BOLDLY created new columns for low polling candidates, and I have objected to that by effectively reverting that action. I suggest we discuss that further.

Of those remaining in the main table, Klobuchar isn't polling high enough to be retained in a column, but should be reflected in the Others column. More candidates would be merged into the Others as the table gets split off progressively, and it's different for those in the state tables as well. Onetwothreeip (talk) 22:46, 26 January 2019 (UTC)

Okay, but I'm not sure if I agree in that instance... if you take only the average of all polls (save the alternate Emerson scenario with only declared candidates), Klobuchar is polling above 2 other candidates and 8 candidates (including Warren, Harris, and Gillibrand, all candidates generally treated as "major") are within 3.6 points of each other. If you exclude the suspect Emerson poll (did someone mess up the SPSS output?), then Klobuchar is still polling above 2 others and 8 candidates are polling within 4.6 points of each other... which again is my concern, since a threshold that's too high is less workable in my opinion than a lower threshold. I think the current level is reasonably restrictive (as all others have been included in few polls/generally polling well south of 1%), but would instead lean towards removing Castro instead of Gillibrand in this instance. (Point taken regarding the state polls, however.) Mélencron (talk) 22:53, 26 January 2019 (UTC)

There's nothing suspect about the Emerson poll, they only polled candidates that have already announced. If we don't think it's a legitimate polling result then it shouldn't be there at all, but we could move it to some other table for hypothetical elections. Using an average of polling results is an unusual criteria, normally there's a given threshold and a number of polls above that threshold, but we can consider both. Who are you saying is polling higher than Klobuchar? Onetwothreeip (talk) 22:58, 26 January 2019 (UTC)

I'm not considering the alternate scenario because it's not really equivalent to the rest of the polls in the table. (I'm also considering with/without it entirely, as the data itself appears to be highly suspect; doubtful that Sanders is polling at only 5% and Biden 15–20 points higher than every other poll, among other things, which leads me to believe there might have been an SPSS issue.) Castro and Gillibrand are the other two below Klobuchar; since neither Klobuchar nor Gillibrand have ever polled at 0% and have been included numerous polls where they've polled above that level, I lean towards keeping them at the expense of Castro (who polls at <1% on average when excluding the suspect Emerson data). A January cutoff also produces a similar result, and Castro again is under 1% minus Emerson, so I would lean towards removing him even then.

In this case, as the number of polls will be increasingly numerous, I also believe that it'll be more useful to average rather than consider discrete poll results within each table (rather than make decisions on the basis of individual outliers of candidates above 5% – which would also make the tables excessively wide over time if we accept every 5% result on its face as enough justification for the inclusion of a candidate – but it's also worth noting that 1) there will be greater clarity on this over time as the lists of candidates pollsters test become increasingly consistent and 2) the date ranges over which these are considered also mean that different sets of "candidates" may qualify arbitrarily each time under different criteria. Mélencron (talk) 23:07, 26 January 2019 (UTC)

Seeing how I'm being mentioned here, I would say that trying to use criteria for poll inclusion applied in other countries is usually not a good idea. Specially if it involves mixing up presidential elections with parliamentary elections, and different kinds of political and electoral systems. It should be mentioned that excluding any one given option that is shown in a given source should be done very carefuly, or else it may go against WP:NPOV. Onetwothreeip, here I'd remind you that while I ended up agreeing with you in that discussion, I also noted you for maybe going too further in column deletion, as some of the parties you wanted to removed there were relevant under Spanish politics standards. Nonetheless, you're right in that it is obvious that the amount of candidates here is just insurmountable to have them all included. Mélencron has a point in that we can't just keep removing and re-adding candidates every week. There must be some way to try to combine both views to have something workable. Clinton is obviously notable, but since she's not running and she's being left out from many polls I'd say to keep her as a footnote in the Others section: that would already save one column. On the criteria to have any candidate included, didn't the various media in 2016 set some polling criteria for including any given candidate in the primaries' debates? That could maybe work as a basis. Impru20^talk 23:11, 26 January 2019 (UTC)

I think the polling criteria were dropped for the debates in favor of fundraising and other measures of grassroots support (as on the GOP side this became a sensitive issue with their crowded debates and separate "undercard" debates for low-polling candidates only), but in any case I'm not sure following debate criteria alone can be entirely useful either (as they also differ between specific states – this is a discussion relevant to third-party candidates who are invited in some states but not in others). I'm not sure if I agree with the Clinton thing since, again, she's mostly been included in multi-scenario polls and I would rather it be clearer to readers what the difference between two rows for the same poll is rather than a set of slightly different numbers in each row without a clear difference. (To trim the table as it is now, I'd agree with removing Castro regardless of the time cutoff as he's polling below 1% save an outlier – maybe an Olympic average of some sort would make it easier to determine cutoffs? – but again, I'm worried about criteria which would be too strict rather than too loose simply because we're talking about margin-of-error/half-point differences between several candidates.) Mélencron (talk) 23:15, 26 January 2019 (UTC)

Impru20 Some of those political parties were only historically significant, and not particularly significant to the 2015 election. I'm well aware of Spanish political history and so I know the parties you're talking about. Either way, what we settled on was certainly fine.

Given that I was expansive in deleting columns in the Spanish election table, that was the reason why I have been very gradual in removing columns from this table, and always adding their results to the Others column. Onetwothreeip (talk) 23:24, 26 January 2019 (UTC)

I think it'll be worth revisiting removing Clinton from the table after maybe another 5 or so new polls are released (after which it'll probably be sufficient to re-collapse the table again to just 2019 polls) – if she isn't added to any other polls, probably worth removing that column. (The same goes for Klobuchar at that point as well if she isn't included in most 2019 polls up to that point, and any other candidates who might be worth removing if they start polling 0% consistently – which none are now, but things may change by then.) Mélencron (talk) 00:31, 27 January 2019 (UTC)

I'll give Klobuchar one more chance before I merge her results into Others. The main lesson here is to start with having the minor candidates in the Others' column, and then progressively adding them as they become popular enough, then removing them from new tables as their popularity ends and they stop campaigning. Onetwothreeip (talk) 05:13, 27 January 2019 (UTC)

FiveThirtyEight poll tracker

Note that some of the poll are national, while others are marked with "N.H." or "Iowa" if they are state-based. Hope this is useful to make sure none of the relevant polls are missed.--Pharos (talk) 21:43, 29 January 2019 (UTC)

Notes

https://thehill.com/homenews/campaign/419455-beto-orourke-seen-as-top-contender-in-2020-race-for-white-house-poll Mélencron (talk) 17:29, 3 December 2018 (UTC)

Gabbard (1+0+0+2+1)/5 = 0.8%
Delaney (0+1+1+1+0)/5 = 0.6%
McAuliffe (0+0+0+0+0)/5 = 0.0%

Kerry (1+1+2)/3 = 1.3%
Hickenlooper (1+1+0)/3 = 0.7%
Bullock (1+0+0)/3 = 0.3%
Holder (0+1+0)/3 = 0.3%
Inslee (0+0+1)/3 = 0.3%
Garcetti (0+0+0)/3 = 0.0%
Schultz (0+0+0)/3 = 0.0%
[Avenatti] (0+0)/2 = 0.0%
[Obama] (17)/1 = 17.0%
[Cuomo] (0)/1 = 0.0%

With regard to the most recent national polls table – probably breaking out Kerry into his own column. Mélencron (talk) 01:01, 19 January 2019 (UTC)

Separating Castro's numbers now. Mélencron (talk) 02:36, 23 January 2019 (UTC)

Biden 25.1%, Sanders 15.8%, O'Rourke 10.1%, Clinton 8.0%, Harris 6.0%, Warren 5.3%, Bloomberg 3.4%, Booker 3.1% (Kerry 2.0%, Klobuchar 2.0%, Brown 1.5%, Gillibrand 1.2%, Bullock 1.0%, Castro 1.0%, Holder 1.0%, Hickenlooper 0.7%, Gabbard 0.5%, Inslee 0.5%, Delaney 0.3%, McAuliffe 0.3%, Garcetti 0.0%, Schultz 0.0%) Mélencron (talk) 00:13, 2 February 2019 (UTC)

Collapsing tables

@Mélencron: We really don't need to be collapsing these tables. We can separate them with subheadings but most articles like this on Wikipedia have polling results that span the entire time between elections without having to collapse them. It also makes them much harder to edit, since it's impossible to edit them on visual editor. Please don't take edits like this personally. Onetwothreeip (talk) 00:11, 2 February 2019 (UTC)

I'm personally really not a fan of VisualEditor at all especially when it comes to table editing as it constantly results in inconsistent markup and isn't really feature-capable... while it's true that it isn't really necessary to collapse them at this point (as the list isn't that long), the article will eventually become extremely long to the point that collapsing them will probably be necessary to make them reasonably navigable (see Nationwide opinion polling for the 2016 Republican Party presidential primaries: the uncollapsed content is reasonably navigable, but the collapsed tables are massive in length), and I also don't really see much reason for having to go back and edit older tables after they're already collapsed. Mélencron (talk) 00:16, 2 February 2019 (UTC)

(By the way: I've also been thinking about how to reasonably determine which candidates to include – I removed Brown earlier because it was pretty clear that he was only polling well with a single outlier pollster; see #Notes. I'm doing a RCP-style average – so not double-counting pollsters, and in order not to double-count polls because of multiple scenarios, averaged those where possible as well). Mélencron (talk) 00:21, 2 February 2019 (UTC)

We can collapse the tables when it gets to that point, we don't have nearly enough results for that. I really can't edit the tables without the visual editor, except for minor formatting edits. Onetwothreeip (talk) 00:33, 2 February 2019 (UTC)

Remove Clinton, Obama from the table, re-add Castro, Gillibrand, Klobuchar, Gabbard, Delaney and everyone actually running

Clinton and Obama are NOT going to run. They only get responses to polls because of thewir name recognition, but that's not worth anything, and they're taking up space for people who are actually running. Emass100 (talk) 17:30, 2 February 2019 (UTC)

I half-agree with Onetwothreeip on this and suggest holding off on this for now until we have a few more national polls (I know of one coming out on Monday and another later in the week), after which the table will probably be sufficient for 2019 polls alone/December polls can be re-merged with the October–November table as there'll only be a single poll each with either Clinton or Obama. As for the latter, here's what I have in terms of averaging (posted above earlier), minus Clinton and Obama: Biden 25.1%, Sanders 15.8%, O'Rourke 10.1%, Harris 6.0%, Warren 5.3%, Bloomberg 3.4%, Booker 3.1%, Kerry 2.0%, Klobuchar 2.0%, Brown 1.5%, Gillibrand 1.2%, Bullock 1.0%, Castro 1.0%, Holder 1.0%, Hickenlooper 0.7%, Gabbard 0.5%, Inslee 0.5%, Delaney 0.3%, McAuliffe 0.3%, Garcetti 0.0%, Schultz 0.0%. Gabbard and Delaney aren't polling at any significant level in polls where they're included, and I don't think there's a case to include any of the last 7–10 on this list. Mélencron (talk) 18:58, 2 February 2019 (UTC)

Why does it matter if they're running or not? This article is about polling, not about the election itself. Clearly their name recognition is worth something if that's what they're polling at. It's highly relevant to the results of the other candidates in the same poll since it takes support from other candidates, and that support it takes is not evenly distributed from all other candidates. The standard way we include parties or candidates in polling tables is if they meet a certain threshold, usually if they are polling above that threshold a certain number of times, especially when there are many options such as this.

I think Melencron might be making it more complicated than it needs to be with these polling averages. Once a candidate starts falling behind the rest then we would remove them, but really we should have started the opposite way. We should start with only the main candidates (and non-candidates), and add more as new candidates poll well. Onetwothreeip (talk) 21:44, 2 February 2019 (UTC)

That's what I mean by averaging, though – it's a fair metric of determining who's "falling behind" and candidates that are "poll[ing] well". Mélencron (talk) 21:55, 2 February 2019 (UTC)

We just don't really do that for anything else. If they were polling 20% and then started polling at 2% in the same tbale, we would surely keep them with their own column. We should look at their three or so highest results, and something like two of them should be 5% or one of them be like 8%. Onetwothreeip (talk) 05:45, 5 February 2019 (UTC)

The chart

It's not a very good chart. I don't think we should have a chart there at this time, but mostly this chart in particular is not very good. Is anybody willing to defend it? Onetwothreeip (talk) 05:43, 5 February 2019 (UTC)

Yes. We are at the point that the chart helps visualize increase and decrease of support over time. But we need to get rid of the empty space, and just show it based on what's already transpired. We also need to list the candidates in order of support, not in alphabetical order. DaCashman (talk) 06:56, 5 February 2019 (UTC)

I like the chart. Agree with DaCashman. Guycn2 · ☎ 07:23, 5 February 2019 (UTC)

The problem is that the chart does not correspond with the polling results. The other chart formats we use are better than this, I don't see a reason to create one like this. Onetwothreeip (talk) 07:26, 5 February 2019 (UTC)

The chart's kinda janky and it's not up to my usual standards chart-wise (something like this is what I usually make) – I'll see what I can do. (As for the idea that "the chart does not correspond with the polling results", I'm not exactly sure what you mean, since it's based on a weighted moving average of polling results – if you're wondering why Sanders/Harris are lower than recent results might suggest, keep in mind that they're weighted separately by pollster so as to reduce the house effects of specific pollsters and also to make sure that the averages aren't determined mostly by one pollster which polls much more frequently than other pollsters – and that there's no massive outlier on the high side for Sanders to balance out the Emerson poll). Mélencron (talk) 14:27, 5 February 2019 (UTC)

~~For reference, that means the most recent date on the chart (February 2) is effectively working with this data:~~

Poll source	Date(s) administered	Sample size	Margin of error	Joe Biden	Michael Bloomberg	Cory Booker	Kamala Harris	Beto O'Rourke	Bernie Sanders	Elizabeth Warren	Other
Morning Consult	Feb 1–2, 2019	737	± 4.0%	29%	2%	5%	14%	5%	16%	6%	9%^[a]
Monmouth University	Jan 25–27, 2019	313	± 5.5%	29%	4%	4%	11%	7%	16%	8%	10%^[b]
Emerson College	Jan 20–21, 2019	355	± 5.2%	45%	7%	8%	3%	3%	5%	3%	26%^[c]
Zogby Analytics	Jan 18–20, 2019	410	± 4.8%	27%	8%	1%	6%	6%	18%	9%	5%^[d]
Harvard-Harris	Jan 15–16, 2019	479	–	23%	5%	3%	7%	8%	21%	4%	8%^[e]

Notes

Notes

^ Sherrod Brown and Amy Klobuchar with 2%; Tulsi Gabbard, John Hickenlooper, and John Kerry with 1%; Steve Bullock, Pete Buttigieg, Julian Castro, John Delaney, Eric Garcetti, Kirsten Gillibrand, Eric Holder, Jay Inslee, Terry McAuliffe, and Gavin Newsom with 0%; other with 2%
^ Amy Klobuchar with 2%; Sherrod Brown, Julian Castro, Tulsi Gabbard, Kirsten Gillibrand, John Hickenlooper, Eric Holder, and Andrew Yang with 1%; John Delaney, Jay Inslee, and Terry McAuliffe with <1%; Pete Buttigieg with 0%; other with 1%
^ Julian Castro with 8%; Sherrod Brown with 4%; John Delaney and Tulsi Gabbard with 2%; Kirsten Gillibrand, Amy Klobuchar, and Richard Ojeda with 1%; other with 6%
^ Sherrod Brown and Kirsten Gillibrand with 2%; John Delaney with 1%; Julian Castro, Tulsi Gabbard, and Terry McAuliffe with 0%
^ Tulsi Gabbard and Kirsten Gillibrand with 2%; Julian Castro with 1%; Michael Avenatti with 0%; other with 3%

~~Mélencron (talk) 14:58, 5 February 2019 (UTC)~~

Never mind, there is indeed an error in the chart which I can't be bothered to fix for now so I'll remove it (the correct totals should be Biden 30.8%, Sanders 15.0%, Harris 8.9%, Warren 6.2%, O'Rourke 5.7%, Bloomberg 4.8%, Booker 4.3%, Brown 2.1%, Klobuchar 1.8% – off by 0.3% on average). Mélencron (talk) 15:06, 5 February 2019 (UTC)

I'm glad that's been sorted, you normally make good charts. I definitely think the polling results should be included in the chart as points and not just the average lines, and I think it's best to avoid creating an average. There are many possible ways to create averages from the data and for us to do that would be WP:ORIGINALRESEARCH. We can let the reliable sources make the averages and we can use them here if we want. There's also an issue of how many candidates we have in the chart, and I think it was generally pointless to have all those candidates below 5% represented in the chart. Onetwothreeip (talk) 21:57, 5 February 2019 (UTC)

Mélencron Can you explain how the polling averages match with the polling data? Really we shouldn't be having our own polling averages at all though. Onetwothreeip (talk) 03:10, 6 February 2019 (UTC)

They're fairly straightforward weighted moving averages with the results from each pollster weighted individually (so as to not oversaturate the average with the results of mostly a single pollster) and accounting for the recency of the poll (with more recent polls weighted more heavily than older polls). As for your other point, it's what I've seen on most other (non-U.S.) polling articles, but I've never seen them used on U.S. articles (except at the presidential level) and I also remember reverting other users/IPs who added links to RCP averages in the past on downballot election articles, so it might not be consistent with my previous reverts for me to include one here. (In some sense, it's ironic that this is the case given that poll aggregation in the media tends to be much more uncommon outside the U.S. than in it is in the U.S.) Mélencron (talk) 03:20, 6 February 2019 (UTC)

What's the weighting though? That's what I'm asking. I have almost never seen polling averages on any opinion polling article we have. I would much rather we used the averages from reliable sources in some way. We're already putting in original research as to how the chart looks, it's more tenuous to be reporting that average as if it was a fact. Onetwothreeip (talk) 04:01, 6 February 2019 (UTC)

It's a visual summary of the data... that's the entire purpose of poll aggregation... and it's a weighted average since it's accounting for the results of more recent polls than older polls, unless you think that a polling average is supposed to be a bunch of horizontal lines? (Also, you say that you've "never seen polling averages on any opinion polling article we have"... I've got most of them watchlisted and the vast majority of dedicated polling articles for future elections do in fact contain them, so I have no idea where you're getting that from.) Mélencron (talk) 04:45, 6 February 2019 (UTC)

(FYI: I don't plan to include a chart again, and if I do, I'll be taking it directly from FiveThirtyEight's GitHub repos – CC-BY-4.0 license, so freely licensed for use here.) Mélencron (talk) 04:52, 6 February 2019 (UTC)

I don't understand why you can't tell me what weighting you're using. It seems like you're just telling me the definition of weighting. Onetwothreeip (talk) 05:32, 6 February 2019 (UTC)

Sherrod Brown

I don't think there's enough reason to keep Brown in the Other column, and I think there is enough reason to make a new column for Brown. In the timeframe of the main table he has polled 8% ~~and 7%~~, despite polling 1% nine times and 2% four times. The average may not be high for him, but keeping those 8% ~~and 7%~~ results out of their own column means the Other column goes up to 26% and 22% respectively. Onetwothreeip (talk) 22:09, 5 February 2019 (UTC)

I think you've misread one of those footnotes... the highest he's polled is 7% in the December Emerson poll (a clear outlier in both December and January), and the other one has him at 4% (but Castro at 8%). On average he's not really breaking out of the field any more than Klobuchar is, though, and almost all of his results are in the 1–2% range. (I'll remind you that it was your idea to remove some lower-polling candidates in order to make the table thinner... higher "other" totals are a consequence of this, but I'll note that Emerson's "someone else" options produce higher results than other pollsters' as well which inflates the "other" column totals – 6% and 15% respectively, and they remain around 20% and 15% respectively even after breaking out Brown. I'd probably just suggest waiting to see if he rises to poll consistently 3–4% nationally more consistently; both he and Klobuchar are polling higher in Iowa but lower nationally.) Mélencron (talk) 23:20, 5 February 2019 (UTC)

My mistake, but that also underlines that we are closer to adding columns for Brown and Castro than for Klobuchar. Onetwothreeip (talk) 02:56, 6 February 2019 (UTC)

Castro's also only polling well in the same outlier poll (though it's true that the average of recent polls incorporating him produces a similar result – more anecdotally, Castro's at 0% in quite a few polls while Brown and Klobuchar have always polled at least 1% in every poll they've been included in). Mélencron (talk) 03:22, 6 February 2019 (UTC)

In my opinion, we should determine who gets a column based on how regularly they are included in polls, not support levels. DaCashman (talk) 06:47, 6 February 2019 (UTC)

Just to clarify the hyperlinks

Basically it's impossible to hyperlink the entire name like "Joe Biden", all that is possible is "Joe Biden", which is functionally the same to the reader. That is how the names are only hyperlinked once, but with two links, one for the given name and one for the surname. It's simply an unfortunate consequence of the < br > template. Onetwothreeip (talk) 03:07, 10 February 2019 (UTC)

@Mélencron: Thanks for sorting out the hyperlinks with breaks. It's right that we hyperlink the first instances as the newest poll result, since it's all the way down the bottom of the table otherwise. It's fine, we'll just put them up every time there's a new poll. I'll do it if nobody else will. Onetwothreeip (talk) 07:51, 10 February 2019 (UTC)

I applied some tweaks and fixes as well. Looks consistent now. — JFG ^talk 10:41, 10 February 2019 (UTC)

@Mélencron: Readers read the article and tables top-down. Onetwothreeip (talk) 22:11, 12 February 2019 (UTC)

Klobuchar et al

I think three polls at 5% is a reasonable benchmark, or one poll above 10%. Onetwothreeip (talk) 01:01, 18 February 2019 (UTC)

Yeah, that's my thought – but a bit stricter by saying 3 polls from 3 different pollsters (to prevent duplicates/double-counting as the DNC's criteria does, but with a specific set of pollsters as well and a different threshold). As of now, there are a few candidates who have hit at least 5% once, but none yet 3 times other than those in the table. Mélencron (talk) 04:54, 18 February 2019 (UTC)

I don't mind if it's from the same polling organisation, but some of them do two polls at once and only the highest result should be considered of those. I don't anticipate another candidate other than Klobuchar reaching this, and even with her I am doubtful. Onetwothreeip (talk) 05:00, 18 February 2019 (UTC)

Agreed. I was about to suggest adding a column for Klobuchar, but it's still too soon; perhaps the 5% was only a reaction to her recent announcement, similarly to Stacy Abrams' State of the Union rebuttal that earned her a spot at 5% although she is not even a candidate. — JFG ^talk 17:52, 18 February 2019 (UTC)

2017 polls

I had removed the 2017 polls as outdated, and Mélencron restored them. I do believe they are irrelevant. In particular, we should remove as undue the lone 2017 Iowa poll sponsored by an O'Malley PAC, and he is not even a candidate. Let's discuss. — JFG ^talk 10:47, 10 February 2019 (UTC)

I completely with Melencron, we need to preserve every single reliable poll. If they are commissioned by a candidate, we should indicate that. Onetwothreeip (talk) 11:07, 10 February 2019 (UTC)

I concur – they're there for historical purposes, and if we want to talk about "irrelevant" polls, then we should remove every single Iowa caucus poll not conducted in the last week leading up to the caucus itself – lists of opinion polls on Wikipedia are meant to be comprehensive, and I'd argue that readers can make that judgment themselves. If the poll was conducted, it should be included, simple as that. Mélencron (talk) 14:39, 10 February 2019 (UTC)

Thé primary campaign is barely starting now. Whatever opinion poll was conducted over 3 years before the election can certainly be deemed irrelevant. Not convinced these 2016-2017 occasional polls even have any historical value. Poll aggregators in sources do not take them into account, and neither should we. — JFG ^talk 19:10, 10 February 2019 (UTC)

We're not aggregating these polls either. I don't think there's any precedent for excluding certain poll results on any of these polling articles. Onetwothreeip (talk) 20:08, 10 February 2019 (UTC)

Essentially this – all polling lists for future elections on Wikipedia cover attempt to cover all polls since the last election; there's no reason to make an arbitrary exception based on an arbitrary threshold because of their age. Mélencron (talk) 20:27, 10 February 2019 (UTC)

Fine. I still believe the Iowa 2017 thing is meaningless, but if it's tradition to keep every poll forever, let's honor tradition… — JFG ^talk 08:24, 12 February 2019 (UTC)

I'm not aware of any such tradition being the reason for this. We're just here to create a comprehensive article about the opinion polling for a given election. While we're at it, can we keep the "among declared candidates" poll down the bottom? It disrupts the flow between the post- and pre-October 2018 tables. Onetwothreeip (talk) 22:15, 12 February 2019 (UTC)

Most candidates launched their campaigns in early 2019, so that any polls restricted to declared candidates make more sense within the section about most recent polls. Hopefully within a couple months, most polls will have focused only on declared candidates, and that will become the first table. — JFG ^talk 17:49, 18 February 2019 (UTC)

Its not about "tradition", its part of the historical record that this page encompasses. Remember wikipedia is not a news source but an encyclopedia. If this page is titled "Opinion polling for the 2020 primary", then its scope covers all opinion polling taken for that election, not just the most recent few polls.XavierGreen (talk) 19:32, 18 February 2019 (UTC)

Splitting the table

Mélencron All the splits really are arbitrary, it was just that the November polls were the next ones to be shifted over. If you want to shift over the December polls as well then I would have no problem with that, I would support that, I just assumed that shifting the December polls would happen eventually, like in a month or so. Really there was no reason to undo that, it took me time to make that and it seems like you're admitting it would happen in some form anyway. Onetwothreeip (talk) 01:54, 2 March 2019 (UTC)

Yeah, I did, as I mentioned in my edit summary... but I'm not really sure why December would make more sense than a split by year, which I think I mentioned a while back anyway as something that would probably have made sense to do by mid-February. Please don't view my revert as anything against you personally, I was just a little puzzled by the cutoff you chose. Mélencron (talk) 01:58, 2 March 2019 (UTC)

It wouldn't have been a permanent cutoff, the "cutoff" was moving gradually as the main table got larger, that's all. I'm puzzled why you undid that and then split it off again, and also adding the December polls to them. Before my last comment I was only aware of the revert you had made. Onetwothreeip (talk) 02:00, 2 March 2019 (UTC)

The split at 2019 makes sense, given the flurry of announcements that occurred in January; thanks both of you for your work. — JFG ^talk 06:45, 2 March 2019 (UTC)

Race entry dates of candidates in main table

In the main table about national polling since 2019, I have added the dates when major polled candidates have entered the race. I believe that is informative to explain their jumps in popularity (or, for some candidates, lack thereof). — JFG ^talk 17:25, 9 March 2019 (UTC)

I disagree, and this is something that both I, Impru20 (talk · contribs), and other editors have frequently removed when they're added – these are lists of opinion polls, not exhaustive lists of events that could potentially have had an impact on opinion polls, and there are other articles that exist precisely for that purpose. Mélencron (talk) 19:44, 9 March 2019 (UTC)

@Onetwothreeip, Mrodowicz, Yeah 93, and XavierGreen: – your input would be appreciated here. Mélencron (talk) 23:26, 9 March 2019 (UTC)

I think it's a can of worms. The field has the potential to be larger than the 2016 GOP field, which means 15+ candidates, and it looks like a mess whenever a candidate enters and exits the race. Look, let's be smart about this. There is no reason to keep a column dedicated to Michelle Obama, who we know damn well isn't going to run and has been featured in only 3 polls this year - for good reason. If anything, keep the poll but add her in a note, or put it in a new, hidden, "hypothetical polling" chart. Likewise with Hillary Clinton. --yeah_93 (talk) 00:11, 10 March 2019 (UTC)

I certainly wouldn't agree with it being use for all candidates. Just the ones that have their own columns in the table. The fact that a candidate isn't running is not a reason not to display the results for that poll. Shifting it to another section would be understandable if it was only Hillary Clinton versus Michelle Obama, but those polls include results from people like Biden, Sanders and Warren. However, I would support shading the columns in grey of those candidates who are confirmed not to be contesting. Onetwothreeip (talk) 00:19, 10 March 2019 (UTC)

Something I wholeheartedly agree with. At least for now they should stay since two people are supporting them and one is opposed. I thank JFG for doing this. If what Mélencron has done is notified another editor who they believe agrees with them in this instance, that would be a clear violation of WP:CANVASSING. Onetwothreeip (talk) 00:08, 10 March 2019 (UTC)

I have no idea what the other editors I pinged in this instance think – I pinged them since they've commented on this talk page before. In any case, I've re-added the announcement dates to the table. Mélencron (talk) 00:22, 10 March 2019 (UTC)

I was referring to your comment at 19:44. The comment at 23:26 was fine, and I should have made that distinction. Onetwothreeip (talk) 00:26, 10 March 2019 (UTC)

Klobuchar, Bloomberg, Clinton & Obama

In the last four Morning Consult polls Klobuchar has polled 3%+ compared to 2% in each of the said polls for Bloomberg. Today Clinton & Bloomberg confirmed that they won't be running in 2020. M. Obama ruled out running some time ago. Therefore, Klobuchar needs her own column. Possibly replace Obama column with Klobuchar. Not sure whether Bloomberg and Clinton columns need to be retained either.--Mrodowicz (talk) 02:14, 6 March 2019 (UTC)

Clinton/Obama have been previously discussed, and I don't agree with removing them at this point (it's important context, especially when they're taking a significant share of the vote when they're still included). Regarding Bloomberg, I think it's still useful to keep his column for historical purposes but maybe collapsed if the tables were separated into one for every two-month period (i.e. this one ending February). Among these suggestions I think I agree most with adding Klobuchar, as she's consistently the highest-polling candidate among those without a dedicated column, and consistently polling above one who did have a column (Bloomberg, who polled at or above 5% on four separate occasions since the new year – Klobuchar has only done so once, however). I'd rather proceed with a bit more caution regarding adding additional columns, though – earlier suggested that polling ≥5% in at least 3 separate polls from 3 separate pollsters would be a reasonable threshold (Bloomberg does with 4, Booker with 3). Mélencron (talk) 04:11, 6 March 2019 (UTC)

@Mélencron I'm generally in agreement with what you've said. I'm not particularly fussed about whether the Bloomberg, Clinton & Obama columns remain - there is some sense to keeping them as you say. However, if we want to avoid adding new columns, we should ensure that those with columns are more significant than those without. As you say, Clinton & Obama are significant for historical purposes as they record high polling numbers when included in polls, so they can stay. Bloomberg, by contrast, has polled very modestly. He was only considered as 'somewhat significant' when he was a potential contender. Now that he is definitely out of the race, if we are to choose between him and Klobuchar, I think Klobuchar would be the better choice.--Mrodowicz (talk) 04:58, 6 March 2019 (UTC)

I don't think there's a need to "choose between" – it's just that he won't be included in polls from this point out, so his column will mostly just contain dashes, and there's no need to remove his column because of it (though it is a logical breaking point in the table). I'm generally fine with adding Klobuchar, but I'd like to wait a little bit longer to see if she polls higher, even though she's already polling higher relative to Bloomberg. I'd like to hear what other editors who have this page watchlisted think, though. Mélencron (talk) 05:07, 6 March 2019 (UTC)

Essentially we've kept Bloomberg here for the same reasons as we have kept Clinton and Obama, because at some point he did poll what we consider high enough and excluding him would be hiding a significant result for those polls and inflating the Others column. I would prefer we eventually have neither Bloomberg and Klobuchar than both. Onetwothreeip (talk) 05:48, 6 March 2019 (UTC)

I think I differ that we don't need tables to cover a certain period of time like two or three months, I think the polling threshold shouldn't regard separate polling organisations. Also if anybody new suddenly reaches 10%, and I don't think anybody disagrees, they should be added into the table right away. Onetwothreeip (talk) 05:52, 6 March 2019 (UTC)

I agree with this. Absolutely no reason for M. Obama to remain in the chart considering only 2 polls have featured her this year. Clinton and Bloomberg are out, and I think we need another chart without those three. It's simple logic. --yeah_93 (talk) 12:09, 6 March 2019 (UTC)

I've created a new table retaining the existing list of candidates starting in March – Michelle Obama/Hillary Clinton can be added back if they're included in any polls/poll higher, but at least for now the new table will exclude them (and Bloomberg, of course). Mélencron (talk) 13:16, 6 March 2019 (UTC)

Obama's inclusion in the table is obvious, for polling above 10% in any poll.

What's the point in making a new table? Surely it would make more sense to have a table for January to March. Onetwothreeip (talk) 20:41, 6 March 2019 (UTC)

It seems like a logical breaking point for where candidates will no longer be polled (both Bloomberg and Clinton having definitively ruled out running yesterday). The GOP article in 2016 did something similar whenever candidates jumped into the race and dropped out. Mélencron (talk) 21:31, 6 March 2019 (UTC)

If only. Bloomberg has still being polled and it's never mattered to polling organisations if Clinton or Obama were candidates. I think the article for the 2016 Republican nomination only makes sense when it was the last few candidates progressively removing themselves in 2016 where there were a lot of polls. Onetwothreeip (talk) 21:40, 6 March 2019 (UTC)

As someone who, the other day, made the suggestion of updating the table to reflect the new realities of the race, I take on board @Onetwothreeip's point. I suppose it makes sense to leave things as they are to the end of the month. In April a new table should go up without Bloomberg (and possibly without Clinton & Obama) and should also include any other candidates whose polling has improved by that time.--Mrodowicz (talk) 01:35, 7 March 2019 (UTC)

I would add Klobuchar to the 2019 table immediately, because she has consistently polled at 2–3%, and even hit 4–5% occasionally. For Bloomberg, Clinton and Obama, I agree with removing them from the next table that should start in April 2019. — JFG ^talk 13:36, 7 March 2019 (UTC)

Okay, I think I might be fine with adding only Klobuchar at this point given that she's clearly the top candidate not currently in the table and only one user here currently objects to that, but preserve the current month's table as-is and just create a new one in April. Mélencron (talk) 15:08, 7 March 2019 (UTC)

When will you finally remove Clinton, Obama and Bloomberg from the table, and add Klobuchar? The table is inadequate right now. — Preceding unsigned comment added by 80.83.135.90 (talk) 08:17, 7 March 2019 (UTC)

When there's a new table. The table has Clinton and Obama with results over 10% and Bloomberg had several results over 5%. Klobuchar has only polled 5% once. Onetwothreeip (talk) 22:14, 7 March 2019 (UTC)

Furthermore I think it's clear that there is about as much merit adding Klobuchar to the table at this time as there is for others like Brown and Castro. Onetwothreeip (talk) 22:24, 7 March 2019 (UTC)

@Onetwothreeip Based on polling averages Castro has 0.8% and Brown, who has now left the race, had 1.7%, whilst Klobachar has 3.3%. Whilst her score is not particularly high it is twice that of Brown and 4 times that of Castro. She also outpolls Bloomberg who was on 2.5%.--Mrodowicz (talk) 00:37, 9 March 2019 (UTC)

We don't go by averages though, we go by thresholds. It might be higher than Brown and Castro but they're not in the table either. Who here supports Klobuchar being listed in a column, and on what basis? Onetwothreeip (talk) 02:26, 9 March 2019 (UTC)

@Onetwothreeip Averages are arguably better indicators than thresholds, as they dampen the impacts of rogue polls. An Emerson poll from 20th Jan had Sanders on merely 5% whilst Booker & Castro had 8% each. Anyone with a passing interest in US politics would know this to be a joke, which was proven 4 weeks later when Sanders raised 6 million in a day - 5% candidates don't raise record number donations! But Booker & Castro here had well exceeded your threshold. So thresholds on their own are a bit useless. You ask 'Who here supports Klobuchar being listed in a column, and on what basis?' I will do my best to answer:

Included in table: Booker (4.5%), Bloomberg (2.5%)

Excluded from table: Brown (1.7%), Castro, Gellibrand, Gabbard, Inslee, Hickenlooper, Yang etc. (all with 1% or less)

Klobuchar is above 3%. Looking at who is included and excluded, which category is the best fit for Klobachar - the 1%ers or the 3-4%ers. I think that the answer is pretty obvious. Now if including Klobuchar resulted in a cluttered table, I would concede that you have a point. However, the table with her inclusion looks fine to me, so I'm not completely sure what your objections are?--Mrodowicz (talk) 07:24, 9 March 2019 (UTC)

That's the reason why we don't use averages, they completely ignore significant polling outliers. If one of those people you listed suddenly polled at 20% tomorrow, that would easily warrant attention. Bloomberg has reached 5% in the same table numerous times, while Klobuchar has only done so once. I don't see why 3% would be high enough either, even if that's an average. The threshold I have set is attaining 5% or more at least three times, or at least 10% once. This is why I do not support a column for Brown or Castro. You're saying Klobuchar should be included in the table because she has an average of 3%? Quite honestly I'd much rather just remove Bloomberg from the table. Onetwothreeip (talk) 11:08, 9 March 2019 (UTC)

@Onetwothreeip by your own rationale if the rogue January Emerson poll mentioned above, had Brown & Castro at 10% (rather than 8%) you would insist that they be given their own columns even though all other polls have them at around 1%.--Mrodowicz (talk) 11:31, 9 March 2019 (UTC)

The reason for the rogue Emerson poll was a bad methodological choice, by the way – the list of candidates was 1) non-randomized (alphabetical) and 2) separated into two lists of candidates; this second list was hidden behind a 3) "someone else" option which allowed respondents to hear additional candidates, which were all choices that served to massively boost Castro and Brown and push down Sanders and Warren. (This is also why I would tend to support the use of basic averaging to determine inclusion over mere thresholds, which are far more sensitive to single outliers, of which there'll be many over time.) Mélencron (talk) 12:48, 9 March 2019 (UTC)

Brown and Castro did not reach 10% in any poll in that table, so that's a non-issue. A poll being good or bad or not is completely irrelevant to how we treat it, including whether or not we include it. Not only would it be original research, it would be completely against this article being a list of all relevant polling information. What matters is simply if the source is reliable, as in it is actually a poll. This is why it should take 10% in one poll, not because it shows any sign of popularity, but because that is a significant poll results and should be prominently displayed in an article about polling. We're not here to maintain some sort of political leaderboard. Outliers are notable results, not something to be hidden. I would appreciate an answer to my question though, You're saying Klobuchar should be included in the table because she has an average of 3%? Onetwothreeip (talk) 21:34, 9 March 2019 (UTC)

I need to go in a moment, but the answer to your question is a "yes". Mélencron (talk) 21:40, 9 March 2019 (UTC)

@Onetwothreeip whichever method we use, be it thresholds or polling averages, neither of these is perfect and both methods have their strengths and weaknesses. Your threshold of 3 polls of 5% or 1 poll of 10% is one way of doing it and probably not a bad way all things considered. In an ideal world, we would show all the candidates, regardless of their polling numbers, but in practical terms it makes little sense to do this, so we only clearly display the best polling candidates. In relation to Klobachar, I'll take a quote from you, made a few days ago on this page: 'Essentially we've kept Bloomberg here for the same reasons as we have kept Clinton and Obama, because at some point he did poll what we consider high enough and excluding him would be hiding a significant result for those polls and inflating the Others column.' On the basis of what you said above, it could be argued that not including a Klobuchar column leads to inflating the 'Others' column by 3-4%. If she's polling well above what the rest in the 'others' column are polling why not include her? In relation to your question about whether I believe Klobuchar should be included because she's averaging 3%, I'll answer that I believe that she can be included because the resulting inclusion deflates the 'others' column and her inclusion does not corrupt the table in any way. Note that Booker and Bloomberg poll at similar levels to Klobachar. I concede that if there were 5 people in the others column polling at around 3%, I would probably not be arguing that they should all be given a column of their own, even though this would deflate the others column. Perhaps rather than setting your threshold suggestion as a hard and fast rule, we could instead definitely provide a column for all those who meet your threshold, whilst allowing some discretion for select candidates who don't meet the rule, but all things considered are otherwise seen as being a sensible inclusion.--Mrodowicz (talk) 08:55, 10 March 2019 (UTC)

She's in the middle between those who have columns and the rest of those considered Other, although there are often those in Other with the same or similar result. I think my threshold guideline is a particularly inclusive one, since it includes Bloomberg when arguably he shouldn't be, but he's doing better in the polling than Klobuchar, and Booker is clearly doing better than both. What makes my threshold relaxed, at least to me, is that I only consider three polls above 5% as necessary, when it could just as well be four or five polls. My main concern is overwhelming readers with a vast and unnecessary amount of information. Onetwothreeip (talk) 10:27, 10 March 2019 (UTC)

We are also giving more weight to recent polls. Looking at the last 10 polls where both candidates are present (except the "Bold Blue Campaigns" outlier), the average score for Klobuchar is 2.7% and the average for Bloomberg is 2.1%, so it would be unfair to keep Bloomberg while removing Klobuchar. Regarding excessive information, the table will get uncluttered in April, when Bloomberg, Clinton and Obama are removed. That will make room for perhaps other candidates who may reach 3–5% levels. Given the wide field of candidates, opinion will likely be spread too thin for most of them to ever reach 3%. Look what happened with Republicans in 2016 before Iowa. — JFG ^talk 11:47, 10 March 2019 (UTC)

There is no reason to give more weight to recent polls and is clearly a case of WP:RECENTISM that isn't suitable for an encyclopaedia. In ten years from now, nobody is going to care much about the difference between a poll in March 2019 and January 2019. The polls earlier this year that Bloomberg did better in are just as relevant to this article as the polls conducted recently. Onetwothreeip (talk) 11:52, 10 March 2019 (UTC)

Of course nobody will care 10 years from now, but right now readers are most interested in recent developments. Recent polls matter more, simply because since January many candidates have declared their intent to run and others declined. In any case, other candidates are polling far behind Bloomberg and Klobuchar, so that their inclusion is representative of the present state of the field. — JFG ^talk 13:38, 10 March 2019 (UTC)

This is an encyclopaedia though, WP:NOTNEWS. If readers want to know the latest about the race, they can search news sources. I said ten years because that's the ten year test (WP:10YT), we're not a live blog or something like that. As for averages, Klobuchar's average is 1.9% while Bloomberg's polling average is 2.9%, using their better result when an organisation releases more than one at the same time. I don't think an average of less than 2% is enough to qualify for a column, and is pretty clearly WP:UNDUE weight when compared to columns like Biden and Sanders. Onetwothreeip (talk) 22:16, 10 March 2019 (UTC)

RCP average?

As you may know, RealClearPolitics is tracking the polls for this race and averaging them: https://www.realclearpolitics.com/epolls/2020/president/us/2020_democratic_presidential_nomination-6730.html

Should we add it to the page? I believe it'd be a terrific inclusion. Their average is usually included for the General Election polling page. Sorry if this has been discussed before without my knowledge. --yeah_93 (talk) 13:11, 14 March 2019 (UTC)

I think more recent consensus has been against including external polling averages on U.S. election articles – it was removed several times on the 2017 AL-Sen article, for instance, and never added at any point to 2018 articles. Furthermore, I dislike the RCP average in general because they only selectively and arbitrarily include certain polls and pollsters, and as such their averages are incomplete (and are just simple averages with no weighting of any sort). Mélencron (talk) 13:37, 14 March 2019 (UTC)

Should Klobuchar still have a slot?

She's polling at 1-3%, which is statistically close to some candidates who don't have a slot (Buttigieg (technically, not a candidate yet), Castro, Inslee, Hickenlooper, and Gillibrand). Should we remove her from March, or should we consider adding some of the other candidates?- Mr X 🖋 15:21, 24 March 2019 (UTC)

I'm a "no" on adding other candidates (they're all polling below Klobuchar and intermittently polling above 1% if at all), but also think it's worth considering removing her in the near future if she continues to poll at 1% in the next few national polls. In March, only Delaney, Buttigieg, and Kerry have polled at 3% or higher, and only once for each. Mélencron (talk) 16:27, 24 March 2019 (UTC)

I also think the inclusion of Klobuchar was premature and based on faulty logic that first accepted Klobuchar should be included and then worked back from that. Onetwothreeip (talk) 21:26, 24 March 2019 (UTC)

Clinton does not belong

Clinton is not a candidate. She doesn't belong on the polling tables, at least not the March one. - Mr X 🖋 22:40, 25 March 2019 (UTC)

It's irrelevant if she is a candidate or not. She is polling. Onetwothreeip (talk) 12:18, 26 March 2019 (UTC)

I disagree. She is not in consideration for the Democratic Party presidential primaries. Maybe we are selecting the wrong type of polls for the purposes of this article.- Mr X 🖋 12:41, 26 March 2019 (UTC)

I also generally agree, and lean more towards the idea that we should place more weight on whether someone is an actual candidate in considering who should be included in the table rather than whether they're merely being included because the likes of Mark Penn and John McLaughlin feel like it. Mélencron (talk) 12:59, 26 March 2019 (UTC)

I'd also indicate that the consensus on this talk page leans overwhelmingly against giving Clinton, Obama, or both their dedicated: I count Emass100, Mrodowicz, yeah_93, MrX, Impru20, IP 80.83, and myself as in favor of removing either or both, and only Onetwothreeip as in favor of their inclusion, so unless any other editors object, I'll remove their columns shortly. Mélencron (talk) 13:05, 26 March 2019 (UTC)

@Mélencron: This is getting ridiculous. The consensus against Obama and Clinton not having columns was for what was the post-March table where they had not polled significantly, so of course they wouldn't have columns in that table and I agree with that. When they poll at 25% though, certainly that deserves its own column. It gets ridiculous having others cells of over 30%. I intend to rectify this soon. Onetwothreeip (talk) 11:38, 28 March 2019 (UTC)

The consensus was that they should not be included based on the fact that they're obviously not candidates. The criteria that you're insisting on is that they're polling over 20–30% in the very occasional polls in which they're included in. Stop trying to impose changes on articles based on WP:IDONTLIKEIT when the obvious consensus in many cases for the changes you desire is the exact opposite – it's purely disruptive behavior. Mélencron (talk) 11:40, 28 March 2019 (UTC)

Clinton and Obama should still be included in the pre-2019 tables, just like Winfrey is there although she had long stated that she would not run. Clinton should also be in the January-February table because she was included in polls by three different institutes and scored highly there, and because she was still ambiguous about her intent to run. From March, that's a settled issue, as she declared "I'm not running" on March 4, and only one poll still lists her. — JFG ^talk 09:26, 29 March 2019 (UTC)

Disruptive behaviour? Do you have anything to support that I am being disruptive? That sounds like a very spurious accusation. I continue to argue that a result of >20% should obviously be in its own column, and there is nothing disruptive about it. It seems like you just don't want to hear me say it, and I have no idea why, it's a complete surprise to me why you find it so distasteful. I think we should give consideration to the suggestion that we put polls containing results for Clinton and Obama into their own table. Other than that, "I'm not running" is not a good reason to exclude anyone from anything since this article details polling results and we're not a news outlet. Onetwothreeip (talk) 09:49, 29 March 2019 (UTC)

The combination of these proposals seems like the best way to address this; I'll take a look at this later. Mélencron (talk) 11:57, 29 March 2019 (UTC)

Suggest splitting 2018 and 2019

I suggest we split 2018 and 2019. That way, we don't have to include the year in the 'Date(s) administered' column. This would give use enough extra horizontal space for another column when we need it. Any objections?- Mr X 🖋 12:12, 27 March 2019 (UTC)

The amount of space taken up by 4 characters is negligible; the reason that I chose to merge the tables from October onwards was because of a consistent set of candidates. The 2016 GOP article split tables whenever candidates entered/dropped out; a logical point for that here would be Bloomberg's choice not to run in early March. Mélencron (talk) 12:22, 27 March 2019 (UTC)

A consistent set of candidates is not really a good reason for merging years. That's what headers are for. Six characters is a lot considering the slate of candidates and non-candidates.- Mr X 🖋 12:30, 27 March 2019 (UTC)

Alright, removed the years. Mélencron (talk) 12:45, 27 March 2019 (UTC)

The recent merger of all national polls since October 2018 into a single table looks very clumsy to me, especially with "other" scores routinely over 20% because non-runners like Hillary Clinton or Michelle Obama have no column in this global table. The article looked more informative when 2018 polls were separate, then Jan–Feb 2019, then everything since March (with fewer columns). I would suggest restoring the previous format, for example as of this revision. — JFG ^talk 23:26, 28 March 2019 (UTC)

I don't mind splitting 2018 and 2019, but I am opposed to arbitrary splits like every two months. I suppose we could split ever month or every quarter. - Mr X 🖋 00:09, 29 March 2019 (UTC)

The 2016 GOP article basically split whenever a candidate jumped in or dropped out – though note that that article listed every candidate polled with their own individual column (which isn't feasible here). I think Bloomberg dropping out is the logical point to split here. Mélencron (talk) 00:59, 29 March 2019 (UTC)

That's one way to do it, but let's get some consensus rather than just repeating what other articles have done and boldly making dramatic changes back and forth. Also, the headings are now not consistent with how the tables are split. March polls appear under two time spans. That's confusing. - Mr X 🖋 02:07, 29 March 2019 (UTC)

Thanks for the updates. I agree that the split in March makes sense. Headings were corrected in the meantime. Next splits can be envisioned whenever the field of viable candidates changes significantly. Discussion prior to any split would be indeed welcome. — JFG ^talk 09:08, 29 March 2019 (UTC)

My position on this is like that of MrX, some splitting of the table is fine, but every two months is too much. "Bloomberg dropping out" is hardly a good reason to do anything. Onetwothreeip (talk) 09:49, 29 March 2019 (UTC)

The March split is justified by both Bloomberg dropping out and Clinton finally stating unambiguously that she won't run. In January and February her name was still included in questions by various pollsters, since then only one. — JFG ^talk 12:05, 29 March 2019 (UTC)

JFG That wasn't an update; it was other editors pushing back on the idea of arbitrary splits. Let make it independent of when non-candidate become even more non-candiatety.- Mr X 🖋 11:35, 29 March 2019 (UTC)

I started an RfC below. - Mr X 🖋 11:47, 29 March 2019 (UTC)

Text and number alignment in cells

I made a (difficult) edit to left align the date field and right align the sample size field in the table, for readability. The reason for alignment is so that readers can quickly read the information without their eyes having to dart back and forth to discern the information. Mélencron insists that there is a some standard that says everything should be center aligned. I contend that it's laziness, not a standard.- Mr X 🖋 12:36, 27 March 2019 (UTC)

It's a practice used on every other polling list on Wikipedia for a reason (center-aligning of dates and sample sizes) – I'm not saying that there's an obvious policy reasoning behind it, but I do contend that the alternative you've implemented is much less readable than the original state of the list, which is why it's used throughout the site. Mélencron (talk) 12:44, 27 March 2019 (UTC)

That doesn't mean that it's intentional. I actually think it's sheer laziness. Hell, most source don't even center align text, which is good indication that we're doing it wrong: [1][2]. I don't know if we have a MOS guideline on this, but here is some wisdom: [3]. Can you please explain how neatly aligning information makes it less readable?- Mr X 🖋 12:48, 27 March 2019 (UTC)

I see you've made a post to the associated MoS talk page; I'll follow what the guidance says (if any), but it's not clear to me that there's an existing guideline on this, either (I also looked). If it's clear that there's something that requires a projectwide change or at least to opinion polling articles in particular, I'll post about it to WT:E&R for visibility. Mélencron (talk) 13:14, 27 March 2019 (UTC)

Yeah, I was very surprised that there is no guidance on it. Hopefully we can get others to weigh in.- Mr X 🖋

FWIW, I find the "everything centered" look to be the most readable for those tables. — JFG ^talk 09:18, 29 March 2019 (UTC)

JFG Could you explain why you personally find it more readable. That notion is contrary to standard, widely-adopted formatting for tabular data, for the same reasons that text is left aligned in print and on the web, or why numbers are right aligned in census tables, bank statements, scientific data, etc. - Mr X 🖋 11:32, 29 March 2019 (UTC)

For example, when aligning small numbers like percentages to the right, they get pushed to the edge of the cell, which is quite unwieldy, unless we add some padding which would make the markup very heavy. — JFG ^talk 12:04, 29 March 2019 (UTC)

Yes, (I thought I had addressed that previously) I agree that the small percentages can be center aligned. I would like to see sample sizes right aligned, and dates left aligned. - Mr X 🖋 13:10, 29 March 2019 (UTC)

Buttigieg

~~If Buttigieg polls at 3% or better on the next national poll, we should add him. He has polled at 3% twice, 2% once, and 1% several times. His Iowa numbers are pretty impressive as well.~~- Mr X 🖋 22:51, 25 March 2019 (UTC)

~~...and it looks like we have just reached that point. Following the standard established for Klobuchar in the January–February 2019 table, I'm going to add Buttigieg.~~- Mr X 🖋 12:00, 26 March 2019 (UTC)

Noting that MrX changed their mind and removed these comments, I cannot endorse the suggestion either. The standard for Klobuchar as it has been for everyone is 5%, ideally a few times. I would say that merging Klobuchar back into others in the March table supports merging back in the January-February table, and now that Clinton has a polling result in March we might as well merge all the 2019 polls into one table again. Onetwothreeip (talk) 12:17, 26 March 2019 (UTC)

I'm not sure 5% is the right threshold. Is there some logic behind it? Clinton, Obama, and Bloomberg should not be on any of the tables because they are not a candidates, which makes their polling numbers outside of the scope of this article.

If we re-add Klobuchar, we need to also add Buttigieg. I would support combining all of the national tables, leaving Clinton, Obama, and Bloomberg out, and creating a clear, consistent inclusion threshold for the remaining candidates. We may also want to include an inclusion threshold for the polls themselves, based on sample size. Also let's make some of the columns sortable. - Mr X 🖋 12:40, 26 March 2019 (UTC)

I'm 1) opposed to excluding polls based on sample size (this is meant to be a comprehensive list and we aren't the arbiters of whether or not to include polls – that's POV), but I could be in favor of creating a single table for every poll conducted since October 2018. The candidates whose inclusion I would support in such a table would be Biden, Bloomberg, Booker, Harris, Klobuchar, O'Rourke, Sanders, and Warren, while excluding Clinton and Michelle Obama. Bloomberg has polled at a fairly significant level in the past before and been included in almost every poll from October through when he ruled out running in March, so I maintain that he should be noted. Clinton and Michelle Obama should both be included in the "others" column in this case. I'm opposed to the inclusion of Buttigieg for now, but would lean towards giving him a dedicated column should he continue to poll at at least 3% (and hit 4% or 5% in some polls) in the future. (I also generally disagree with the weight that Onetwothreeip puts on using whether candidates poll high in just one or two polls to determine inclusion, rather than a general ranking of which candidates are breaking out and which aren't; of the candidates I've listed, all have generally outpolled the rest of the field whenever they've been included.) For polls conducted before 2019, I'd support keeping columns for Biden, Booker, Cuomo, Gillibrand, Harris, Sanders, Warren, and Winfrey. Mélencron (talk) 12:55, 26 March 2019 (UTC)

OK, I'm not in hurry to add Buttigieg. I can live with Klobuchar given that we've combined tables. I think Bloomberg is a waste of a good column, but meh.- Mr X 🖋 13:54, 26 March 2019 (UTC)

I think he's worth noting given the number of times he polled ≥5%; before Klobuchar declared her candidacy, he polled at her level or higher in 16 of 19 polls, fwiw. (Side note: on this talk page, I count 5 users – myself, Mrodowicz, Emass100, JFG, and IP 80.83 – in favor of including Klobuchar in the table and only one – Onetwothreeip – against. I'd like to see Buttigieg poll more consistently higher in forthcoming national polls before adding him yet, though I don't doubt that he's probably seeing a bump right now.) Mélencron (talk) 14:02, 26 March 2019 (UTC)

Suggesting that I think a candidate polling "high in just one or to polls to determine inclusion" is a total fabrication of anything I have said. I think my position on Klobuchar has been vindicated and providing a column for her polling results looks unfortunate in retrospect. This is why I have insisted on a 5% threshold, because it's very easy for candidates to rise to a 3% average and then fall below that. The point of this table is that when a candidate gets a column, they have that column forever. We aren't here to monitor who should lose a column for poor performance. I didn't make it up, polling at 5% multiple times reflects the thinking on many other articles, and there's good reason why we don't use averages to determine these things on other polling articles. 5% at least three times is the best inclusion criteria. While we're at it, the two 2019 polls table might as well be merged now. Onetwothreeip (talk) 21:31, 26 March 2019 (UTC)

The downside to keeping columns forever is that the width (as well as the length) of this table could become quite unwieldy by super Tuesday. I wouldn't be surprised to see 3-4 other candidates poll at 5% or more before it's over. FWIW, Buttigieg will probably break through 5% very soon because of recent news coverage. We can add him then. I would strongly disagree that we need to wait for three polls at that level, and obviously we have already broken that rule anyway. - Mr X 🖋 21:46, 26 March 2019 (UTC)

Forever meaning in that particular table. When a new table is made, not all of the columns have to be replicated. By 2020 we wouldn't have a column for Clinton or Obama, but the columns for Clinton and Obama would remain in the 2018/19 tables, it's not like they would be removed from low/no performance in 2020. A Wikipedia editor's opinion assuring us that somebody will reach 5% very soon is obviously not something we would take into consideration. If and when he gets to 5% a few times then of course we would create a column for his poll results, there is no need to do so in preparation for this. I think all editors when they come to this article need to pretend they don't know what is actually happening in the election, and act as if we are a jury that is only allowed to look at the facts in front of us. Onetwothreeip (talk) 21:52, 26 March 2019 (UTC)

OK, I thought we were talking about keeping a single table from now on, but if there is some milestone at which we would start a new one, the issue of multiple columns is a non-concern. As Mélencron pointed out, there's pretty clear consensus for leaving Clinton and Obama out. My speculation about Buttigieg was not to suggest that we need to create a column now—only when he polls at 5% or better. I agree with you last point, and as long as we have consistent criteria, it shouldn't be problem.- Mr X 🖋 22:02, 26 March 2019 (UTC)

There's only consensus for leaving out Clinton and Obama when they don't poll high or often enough. We don't need a new table every month or so, of course. They have been polling more than 20% in some results so in some table at least they would need a column. It is incredibly important information when assessing the results of the other candidates in the same poll. Onetwothreeip (talk) 22:21, 26 March 2019 (UTC)

Before we decide who should have a column and who shouldn't, we need to first decide whether we should group the polling by quarterly intervals (Jan-March) or bi-monthly intervals (Jan-Feb). Once that is sorted, we can have a sensible conversation about the criteria for inclusion. It's very difficult to do this the other way around. How you sub-divide the polling data will inevitably have some impact upon whether a particular candidate is granted a column.--Mrodowicz (talk) 10:16, 27 March 2019 (UTC)

Without knowing how many polls there will be, or who may enter or leave the race, it's difficult to know how we might split the table until the time comes.- Mr X 🖋 11:27, 27 March 2019 (UTC)

seeing as Quinnipiac has him at 4% and every recent poll has shown a bump for him, I think I'll go ahead and WP:BOLDly add a column to the table (as some other editors have already done over the past few weeks). Mélencron (talk) 10:13, 28 March 2019 (UTC)

If and when Buttigieg polls significantly is when we should create a column for Buttigieg's results. I've maintained that significant ought to mean reaching 5% a non-trivial amount of times, but there's no way Buttigieg has been polling of any note or significance yet. Major crystal ball stuff going on here, it's not our job to declare who are the frontrunners. Onetwothreeip (talk) 11:12, 28 March 2019 (UTC)

Okay, but I contend that you're alone in the threshold you insist on and this isn't crystal-balling – it seems self-evident that Buttigieg is polling high enough to be included in the table based on the polls that have now been released, and your threshold places much more weight on outliers than it does the broader image. Mélencron (talk) 11:34, 28 March 2019 (UTC)

His last results are 4%, 3%, 2% and 1%. 5% is already a low bar, and otherwise we would be including Abrams, Brown and Castro, and probably Kerry and Holder. You remember what the table was like with all those columns, let's not go back there. Onetwothreeip (talk) 11:41, 28 March 2019 (UTC)

(edit conflict) 4% is significant. I support Mélencron's bold edit, and consensus seems to lean toward it, but I'm not inclined to revert. 5% would be a different matter, without any requirement for multiple instances. - Mr X 🖋 12:00, 28 March 2019 (UTC)

It's the same for 4% with Buttigieg. Unless I am mistaken, they have not reached 4% three times. The criteria for inclusion without multiple instances was 10%. Onetwothreeip (talk) 12:05, 28 March 2019 (UTC)

That's your criteria, which most users on this talk page don't strictly agree with (given the general consensus in favor of including Klobuchar but also excluding Clinton/Obama.) Mélencron (talk) 12:14, 28 March 2019 (UTC)

I'm not aware of any consensus for establishing 10% as a threshold, or any other number really, or multiple instances. So far, this has been mostly ad hoc.- Mr X 🖋 12:16, 28 March 2019 (UTC)

Yes I was describing "my" criteria, I'm sorry if that wasn't clear, but these are principles that are common on other polling articles and not arbitrarily my opinions. I think everyone would be happy if we just made a separate table for polls that include Clinton and Obama since they are fairly special cases and when people are considering the context of the election they know these polls wouldn't normally belong in the same table as those which are polling for who are expected to be candidates. I still think it's sensible we take some time to wait and we were more cautious on Klobuchar than we are now on Buttigieg, but also that we've seen high polling from Abrams, Brown, Castro, Kerry and Holder and it would have been a mistake to give them a column when their high polling outcomes happened, I think we can all agree. Onetwothreeip (talk) 21:24, 28 March 2019 (UTC)

Not sure if I agree with the separate table idea, either – I'd prefer to minimize the number of separate tables needed, again, if possible, and as you've previously noted, it's not obvious why a separate table is always needed since each poll tests different scenarios in any case. I do think that both Obama and Clinton's figures are notable, but the consensus as of now is overwhelmingly against including them (visibly) in the tables because they aren't candidates in any universe (maybe in an alternate universe I'm not aware of, though). Mélencron (talk) 22:47, 28 March 2019 (UTC)

I'm starting to warm a bit towards @Onetwothreeip's idea of three times 5% and one times 10% thresholds as a criteria. The problem we're finding ourselves in, is that no one else has articulated an alternative viable position. I, among others supported a separate column for Klobuchar because she was polling at 3 or 4% for a few weeks. Now that she is polling at a bit above 1%, no different to the rest of the pack, a separate column seems less justifiable. Off course polling numbers will fluctuate in any case, but if we set a reasonably high (but not unreasonable) criteria, as @Onetwothreeip has done, then at least we're not arguing all the time over a candidate's worthiness for inclusion. Most, I think, are opposed to the polling averages criteria, so we need an alternative solid criteria we can all agree on, and so far the 3*5/1*10 rule seems the most plausible option put forward.--Mrodowicz (talk) 02:42, 29 March 2019 (UTC)

I'm not opposed to the principle, but I think it's a rule which in general is too sensitive to outliers. As an alternative, I think maybe setting a threshold of polling at least 2% or higher in at least 5/10 and at least 3% or higher in 3/10 of the last 10 polls from unique pollsters in which they have been included may be appropriate, in addition to being included in at least a bare majority of polls within a table. That's a fairly limiting threshold which avoids using polling averages at any point. (This almost certainly excludes Klobuchar immediately if she fails to get 3% in the next poll; Buttigieg would need to hit 2% with a different pollster without falling below 2% in any intermediate polls. Among the polls taken since the midterms, no candidate comes anyway close to meeting this threshold other than the ones currently in the tables plus Buttigieg.) From the midterms through February, Klobuchar polled at least 3% with 4/9 pollsters and at least 2% in 7/9. (Brown only hits 3% with 1/9 and 2% with 2/9, for comparison; Gillibrand only hits 3% with 1/9, and with no other polls at or above 2%; no other candidates included in a majority of polls even hits 3%.) Mélencron (talk) 02:54, 29 March 2019 (UTC)

My concern is that it is fair enough that Clinton and Obama shouldn't be considered candidates in the same way that Biden, Sanders etc are, but also that their polling results are very significant and shouldn't be hidden away in the others column. We have precedent for this when we separated the polls that featured only declared candidates into their own table.

I intended the three-times 5% criteria to be one that excludes candidates with outlier results, but we can raise this to four or five times. I think reducing the polling amount to be included really starts including insignificant candidates, and I would also like to see criteria that is as simple as practicable. Onetwothreeip (talk) 06:47, 29 March 2019 (UTC)

Setting a 5% threshold is too high while there are 15+ candidates. I would suggest "3% in at least 3 out of the last 10 polls" is a simple enough rule to follow, and would roughly match what we are getting now. Klobuchar would be removed as soon as the next poll rolls in, unless she gets 3% there. Buttigieg would get a column next time he hits 3% (he is there in two recent polls already). — JFG ^talk 09:17, 29 March 2019 (UTC)

The problem with any criteria out of the last x polls is that this would end up removing candidates from the table. It's an absolute nonsense that we would purposefully add and remove columns. Once someone gets a column, that should remain, and a new table can be formed where they may or may not be included there. Onetwothreeip (talk) 09:49, 29 March 2019 (UTC)

I fail to see what's wrong with removing candidates who haven't stayed above a reasonable threshold for a while. Case in point: Klobuchar. She got a bump in polls that put her temporarily above the pack of the 1-percenters, but that was short-lived, so she got included for a few weeks, and she will soon be removed. No problem at all. — JFG ^talk 11:58, 29 March 2019 (UTC)

Agreed. (As a side note, I'd prefer to keep the stipulations on the 3%-in-10 criteria, mainly because the rule is otherwise biased towards both those organizations which poll most frequently as well as more minor outlier results – if they're polling at least 2% in a majority of polls in addition to the other stipulations, it's probably not just noise.) Mélencron (talk) 12:02, 29 March 2019 (UTC)

Can you clarify which "stipulations" you'd like to follow in addition to the suggested "3% in 3 of the 10 latest polls"? — JFG ^talk 13:10, 29 March 2019 (UTC)

Instead of the "10 latest polls", consider 1) the most recent polls from the "10 most recent pollsters", and also 2) "2% in 5 of the last 10". As of now, there isn't much of a difference in either rule, however, because of the number of different pollsters which have been in the field, however (overall difference for Buttigieg and Klobuchar is just 1 poll – using your rule, it's Buttigieg (3, 5), Klobuchar (3, 6) versus my proposal, which would produce Buttigieg (3, 5), Klobuchar (2, 6) and remove Klobuchar from the most recent table). Mélencron (talk) 13:13, 29 March 2019 (UTC)

(edit conflict) While I kind of like JFG's suggestions of "3% in at least 3 out of the last 10 polls" as being more inclusive, I can't help but think that we are not getting much closer to a consensus here. Perhaps we should see how poll aggregators address this?- Mr X 🖋 13:17, 29 March 2019 (UTC)

I decided to go for a pure threshold because of Onetwothreeip and Mrodowicz's objection to the use of averaging. Mélencron (talk) 13:32, 29 March 2019 (UTC)

RfC: Should the poll tables be split by month, quarter, year, or when candidates drop out of the primaries?

The consensus is for no set criteria for poll tables.
Cunard (talk) 05:29, 5 May 2019 (UTC)

The following discussion is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

Should the poll tables be split by month, quarter, year, or when candidates drop out of the primaries?- Mr X 🖋 11:46, 29 March 2019 (UTC)

Month - The number of national polls justifies splitting by month to keep the table sizes manageable, and to keep the time span logical for readers. It has the benefit of allowing us to add or delete columns in a more granular fashion, so that we avoid issues with columns filled with blanks, and uncomfortably wide tables. My second choice would be by quarter, then year. I'm opposed to splitting by any non-time-related criteria such as when candidates drop out of the primaries because I believe it would be confusing to our readers.- Mr X 🖋 11:46, 29 March 2019 (UTC)
No set criteria, decide on an ad hoc basis – the number of polls each month will be variable and it makes little sense to split every single month, especially if some months will have the same exact set of candidates or relatively higher or lower volumes of polling; the necessity of different columns for candidates would be better to justify this, but I'd suggest always splitting after a new year (2018/2019), rather than arbitrarily by each month. Mélencron (talk) 11:59, 29 March 2019 (UTC)
Ad hoc – Events should drive the splits. 2018 midterms was a natural cutting point, so was January 2019 when a lot of candidates announced their campaigns. If the field remains relatively stable over several months, there is no reason to enforce a split at a calendar boundary. Editors should suggest new splits when they feel enough change on the ground justifies a new table. — JFG ^talk 12:02, 29 March 2019 (UTC)
Quarterly, otherwise ad hoc. There aren't going to be enough events, but when they do happen and it conforms with changes in the data they are good opportunities for splitting the table, but quarterly splits at the basis. We really should not be editorial on this article at all, and we need to disregard the actual nature of the election as much as we can so that we can present the information in the most neutral and encyclopaedic way. Onetwothreeip (talk) 23:53, 31 March 2019 (UTC)
@Onetwothreeip: is there some way of splitting the poll tables that you would consider edtitorializing?- Mr X 🖋 00:22, 1 April 2019 (UTC), 01:57, 1 April 2019 (UTC)

Assuming you don't mean "would not", doing so to establish what constitutes a significant event would be editorialising. Claims that a column would be "wasted on Bloomberg" or that Hillary Clinton "isn't even running" for example indicate that people are projecting their views of how they see the election onto this article. Onetwothreeip (talk) 00:39, 1 April 2019 (UTC)

No set criteria. We should split the table based on what will serve readers best, and we can't predict how often the pollsters are going to issue polls or when candidates are going to enter or leave the race. --Metropolitan90 (talk) 05:26, 1 April 2019 (UTC)

The discussion above is closed. Please do not modify it. Subsequent comments should be made on the appropriate discussion page. No further edits should be made to this discussion.

HarrisX polls

Some of these HarrisX polls are just a breakdown of each day in the entire period of one poll. This appears to be a strategy by this polling organisation to inflate their presence and report their polls three times each. We don't need to be following that, we should just report the full sample poll. Onetwothreeip (talk) 22:02, 5 April 2019 (UTC)

Yeah, I self-reverted because I was already doing the same on the GOP page (daily results for seven-day periods, so just include one entry for each week rather than every single day's results). Mélencron (talk) 22:26, 5 April 2019 (UTC)

Hyperlinks

It seems apparent now that we have a list of the candidates at the top of the article, we can do away with the mess of hyperlinks we have integrated into the tables. Onetwothreeip (talk) 23:35, 8 April 2019 (UTC)

Yeah, I'd be down for that – it's irritating to have to keep adding and removing them every time as well. Mélencron (talk) 23:38, 8 April 2019 (UTC)

Graphical summary

IainDavidson added this scatter diagram with trend lines but was quickly reverted by Mélencron: (For further discussion and notes about the original graphs see IainDavidson talk )

The chart

I can't find any existing consensus for excluding such material. I would like to see us include something like this, but I think trend lines can be a bit misleading (and possible WP:OR), so perhaps we can look at a moving average similar to this:

2016 example

I wonder if there is a way to do this natively without having to render a graphic each time the data changes.- Mr X 🖋 11:16, 15 April 2019 (UTC)

@MrX: I update a few on other articles, but they're just simple weighted moving averages rather than LOESS/Kalman trends. I can do something similar using FiveThirtyEight's csv, though again I'd have concerns about which candidates to include given the various inconsistencies between national polls right now in terms of who they're actually including, echoing these pieces (i.e., that some movements may be illusory and the results of simply adding or removing candidates from one poll or another; Morning Consult does so extremely conservatively, but others like Emerson frequently swap candidates in and out on a whim, which is why I doubt the usefulness of adding a graph at such an early point when there's only one pollster doing frequent national primary polls). Mélencron (talk) 12:12, 15 April 2019 (UTC)

Yes, I think something like the examples you showed would be awesome. As far as which candidate's to include, we could create graphs to correspond to each polling tables, and maybe collapse all but the most current. There are other considerations as you point out, so let's see what other editors think.- Mr X 🖋 13:39, 15 April 2019 (UTC)

I mean, I don't see the point of not just creating a single graph (unless you're just referring to the way in which the candidates in each table differ)? I'm able to grab the underlying data including candidates in the "others" column without any real difficulty as the data is readily available. I can probably take a look at this sometime later today, but I'm also reluctant to add a graph before any other reputable sources have seen it appropriate to do so (e.g. FiveThirtyEight/HuffPost Pollster/other publications like the Economist/FT which frequently create their own as well). (I'm not talking about candidates like Klobuchar/Buttigieg who have been in every poll since they've declared – I'm more so talking about Yang/Swalwell/de Blasio/etc. who are less consistently included.) Mélencron (talk) 13:53, 15 April 2019 (UTC)

As a comment, RealClearPolitics recently added a chart to their polling average. PaperKooper (talk) 15:51, 15 April 2019 (UTC)

@MrX:: would this do? Using LOESS smoothing here, and including only polls that include Biden/exclude Clinton/exclude Obama and including only candidates included in at least 10 unique polls. (As a bonus, accompanying figures for every candidate can be found here.) Mélencron (talk) 20:57, 15 April 2019 (UTC)

I think this really highlights that we should separate Clinton/Obama polls from the rest. As for the graph, again it's a matter of keeping the highest polling until we decide somehow that we have enough. We don't need all that clutter on the very bottom of the graph that we often see. We should also consider only showing results on the graph for when they have started a campaign. Onetwothreeip (talk) 21:25, 15 April 2019 (UTC)

@Mélencron: Yes, I think that would work well, although I would prefer omitting the plot points/lines for candidates consistently polling very low in order to avoid clutter. I also agree with Onetwothreeip that we should only include plots for candidate once their campaign has started. I know that's a deviation from the approach we have used for the tables, but I think it's a reasonable compromise for the sake of readability.- Mr X 🖋 21:51, 15 April 2019 (UTC)

ETA: We should also include footnotes about how the chart is plotted, and an explanation of why some candidates are omitted.- Mr X 🖋 21:54, 15 April 2019 (UTC)

How about separate plots for candidates polling at a higher/lower level? That was done in the past with at least another polling article, and I think that approach might work better here. Mélencron (talk) 21:55, 15 April 2019 (UTC)

Sure, that's a good solution.- Mr X 🖋 22:12, 15 April 2019 (UTC)

Gotcha. Will have to make a few tweaks, but this seems like the best solution – would also allow for just adjusting the axes accordingly as well. Mélencron (talk) 22:16, 15 April 2019 (UTC)

@MrX: Added in gallery form, given the number of individual graphs/permutations. (As is the usual practice with me, I'll choose a semi-regular basis to update any such charts – as with the German polling chart, I'll probably update it weekly following the release of a specific poll – INSA on Mondays in the German case, Morning Consult on Mondays/Tuesdays in this case.) Mélencron (talk) 00:00, 16 April 2019 (UTC)

I'm not actually decided on whether to only start lines from when the candidacy starts, I think it's something I'll have to see first. I don't think it really needs explanation why only the main candidates are pictured in the graph. As for another graph for minor candidates, I don't think that produces meaningful results. We could have some table that shows which candidates have polled at 1% three times though. Onetwothreeip (talk) 22:06, 15 April 2019 (UTC)

I've added a debate threshold table in the past but it was removed as it was considered a violation of WP:SYNTH; you already know that I keep a copy in my sandbox. Mélencron (talk) 22:08, 15 April 2019 (UTC)

It probably violates WP:SYNTH to say that certain candidates have qualified for the debates, but it wouldn't be a violation to simply have a table of which candidates have reached 1% and how many times they've done so. Have you been updating that table? Onetwothreeip (talk) 22:22, 15 April 2019 (UTC)

...you're basically just trying to find a roundabout way to include a debate qualification table, just without saying that it's a debate qualification table. Not gonna pass. Mélencron (talk) 22:29, 15 April 2019 (UTC)

I'm not in favour of including the table that's in one of your sandbox pages. First of all it would have to be vertical, but I'm not interested in including the particular polls in that table. Something like one column for names and one column for how many polls they have resulted at least 1%, up to three polls. We shouldn't say that they have qualified or that they are yet to qualify, that's completely a matter for the DNC and it would be synthesis to say they have qualified from their poll results. Onetwothreeip (talk) 09:15, 16 April 2019 (UTC)

Just to add my two cents: the trendline in these graphs is so useful that any WP:OR concerns about them can be overruled by WP:IGNORE. Emass100 (talk) 14:42, 16 April 2019 (UTC)

More 2 cents. :) Would have been nice for someone to leave a note on my talk page to let me know there was a discussion about my add. I just checked today and noticed that the graph was removed off the page. Took me a while to figure out what happened. I am happy to adjust chart, colors, remove trend lines (really not enough data to include trend lines for now anyway). I also found an R script, could be adapted to pull data off the page and generate SVG image auto-magically. IainDavidson (talk) 16:30, 16 April 2019 (UTC)

The current graphs are all generated by an R script already, but they grab data from FiveThirtyEight's database as opposed to from this page – the data are identical, but it saves the hassle of having to parse the "others" column manually. Mélencron (talk) 16:35, 16 April 2019 (UTC)

Although, I appreciate the taking over of the graph effort of what I started. (Takes the burden from my shoulders for sure!). I feel that Mélencron was a bit of a bully (or just too aggressive on editing?). I felt my contributions were simply deleted and ignored. No communication was given to me that my graphs were helpful or even a good idea. Nor any communication of working together on updating my graphs or ideas. He simply created his own graphs and replaced my own, along with reverting my edits. Since I see that he is a major editor/moderator of this page, and I don't really see any point to continue my efforts on future contributions for this page. I will simply leave this note and continue my own graphs on my own. Any future comments can be left on my talk page. Please feel welcome to drop by for further discussion and notes about the original graphs see IainDavidson talk . Any suggestions on other locations I can link in the graph is appreciated too. :)

--Iain (talk) 17:39, 16 April 2019 (UTC)

@IainDavidson: I pinged you yesterday so that you could join the discussion. It never hurts to propose something on the talk page first, especially if it will require a lot of effort to create. In any case, you helped with the collaboration with your WP:BOLD proposal. I think the issues with your graphic was the very large plots points, and the straight-line trending which is not all that useful in this situation.- Mr X 🖋 19:18, 16 April 2019 (UTC)

These graphs are far too small. We should at least have graphs that can be read, otherwise they are of no use to the article. I agree with MrX in removing the individual candidate graphs from the article, but I will hide the other graphs for now. We really don't need that many graphs even after removing the individual candidate graphs, I feel like Melencron really wants there to be a graph that includes all the minor candidates for some reason, even though a graph that has only the candidates polling on average more than around 5% is much more useful. Onetwothreeip (talk) 22:55, 16 April 2019 (UTC)

I think you're missing the point here. They're in gallery form because there are many of them, and to allow users to choose to expand and view any single graph that they might wish to, which I trust is something that readers can easily figure out. Since there are disagreements as to exactly which sets of candidates should be included, I created six permutations corresponding to the usual requests (on and off-site): [declared, not declared] and [all candidates, top-tier candidates, lower-tier candidates]. Users can figure that out. Second, as for the individual graphs, I think they're useful for viewing the trends for any individual candidate, especially if they're mixed in with the others on the main graph here. I don't believe that others here agree with your approach in choosing to include only a graph that focuses on the top-tier candidates rather than at least giving the option of viewing lower-polling candidates, and which is why MrX reverted your earlier removal already. Mélencron (talk) 01:09, 17 April 2019 (UTC)

Those images actually can't be expanded, they can only be opened up. I think we should do what has been done before, a graph that contains the higher polling candidates, and a graph that contains everyone else. The trends of individual candidates could then easily be identified in those images. Onetwothreeip (talk) 01:13, 17 April 2019 (UTC)

That's what was here before, but you removed it. (I also don't think the difference regarding the images you just mentioned actually matters substantively, either.) Mélencron (talk) 01:14, 17 April 2019 (UTC)

When they have to be opened up it's a new page, and the entire image has to be downloaded. Images are normally rendered in the article and compressed, where it's much easier to scroll through the article and back to the graph(s). The main problem with graphs containing candidates polling at very low results is that changes such as from 0% to 1% can seem more important than they really are. Onetwothreeip (talk) 01:22, 17 April 2019 (UTC)

Oh, I get what you're saying. All that requires is changing the scale to be consistent between graphs, like so: https://imgur.com/a/YpjM2Mf (I'm also not exactly sure why you think that using a gallery is an issue, as they're used fairly regularly across the site – the Media Viewer MediaWiki extension treats them identically to normal image thumbnails and directly transcluded images, but some reason doesn't work with the latest Chrome update.) Mélencron (talk) 01:28, 17 April 2019 (UTC)

Most images are designed and displayed to be consumed on the same page as the rest of the article's content. I think we can all agree that this article can have a graph that is readable on the article itself, as we do on every other polling article. Onetwothreeip (talk) 02:21, 17 April 2019 (UTC)

Right, and that's where we disagree – I don't believe that the page as it was in its previous state was an impediment to readers; in fact, on two different occasions within the last 24 hours, I've encountered people with whom I'm acquainted (who aren't aware of my editing) sharing two different versions of the graph on Twitter and Discord without issue... I think you're making an issue out of one where there's none, and not really one that's actually based in any policy concerns.

I don't want to keep on ping-ponging ad infinitum so I'd rather get some further input on this from other editors who've periodically engaged here/within this discussion for input on this: @Metropolitan90, JFG, PaperKooper, Emass100, Mrodowicz, Yeah 93, DaCashman, XavierGreen, Impru20, and Guycn2: Mélencron (talk) 02:35, 17 April 2019 (UTC)

That's not my criticism at all. For people who want to download the images there is not much difference. I don't think we're actually disagreeing on anything? Onetwothreeip (talk) 02:52, 17 April 2019 (UTC)

That's not my point, though – I'm saying that readers are capable of hitting the left mouse button once, and that it's hardly expecting too much of readers to be able to do so. Mélencron (talk) 03:00, 17 April 2019 (UTC)

Sure, but we could have one or two main graphs displayed in 500x400 so that they don't have to download the entire image to see the graph(s), like we do with every other article. Onetwothreeip (talk) 03:06, 17 April 2019 (UTC)

And I disagree that that's somehow a problem! The reason there are so six permutations in the first place is because there's obvious disagreement about which scenarios to give preference to: it's a deliberate choice to let people choice which to view – if they want to view one or another, they can click once on it to expand it. I fail to see the problem here. (I also think that 1. we do actually disagree on something, 2. we're not going to get anywhere just ping-ponging between the two of us, and as a result 3. it's more useful to wait for further input on this from other users than perennially ping-ponging between the two of us, who are in any case 72% of all discussions here...) Mélencron (talk) 03:08, 17 April 2019 (UTC)

The graph size for most graphs is not a problem. This graph should be displayed at a larger size than the others. We should keep the graphs for indivial candidates in this article. Emass100 (talk) 03:31, 17 April 2019 (UTC)

I'd just rather not have readers wait for half a minute to load each graph when they've already loaded the article. I don't think there's really disagreement over what the graph should include, for the most part. I like the idea of one graph for the top candidates, and another graph with a different scale for the lesser candidates. Onetwothreeip (talk) 03:43, 17 April 2019 (UTC)

Is opening mostly-blank images really an operation that takes half a minute for the regular user in 2019? I think the issue of smaller graphs added in a gallery is greatly overstated and it is in fact the most elegant way n which to add several such charts to the article. PraiseVivec (talk) 12:07, 17 April 2019 (UTC)

I'm not a huge fan of the individual candidate graphs, but there's certainly nothing wrong with having a minimized section with them. I agree with Emass100 that the "All tested candidates" graph should be displayed at a larger size, similar to the other polling pages. PaperKooper (talk) 17:35, 17 April 2019 (UTC)

Yes, and certainly for me, depending on the connection. It's worse on mobile connections too. Onetwothreeip (talk) 22:45, 17 April 2019 (UTC)

Is it really? It's smaller than the main article in the total byte-size of the page including images, and also loads in roughly a third the time... unless your connection is really that abysmal? Mélencron (talk) 00:01, 18 April 2019 (UTC)

I'm not saying it's larger than the article itself, it's just something that would have to be opened up and loaded completely in addition. It's not just about connection though, it's also generally the computer speed. Anyway I'm in general agreement with the views expressed here. Onetwothreeip (talk) 11:36, 18 April 2019 (UTC)

Is there a reason why PNGs are being used? Wouldn't a SVG render similar to File:Australian federal election polling - 46th parliament - primary.svg be easier and less data intensive? Also, I'd suggest making the dots semi-transparent similarly to the graph. When they're not the graph looks way too busy and it's hard to gather information from it. — Preceding unsigned comment added by Catiline52 (talk • contribs)

@Catiline52: Yeah, I can generate SVGs instead, ~~but it doesn't make a difference as to how "data intensive" it is, as they're converted to PNGs anyway when embedded on articles~~ ~~the SVG files are in fact larger than the original PNGs, though the converted thumbnails are similar in size~~ (and the points are already semi-transparent, by the way, though I can just adjust the alpha further). Mélencron (talk) 02:16, 25 April 2019 (UTC)

Modified SVG versions (with adjusted transparency for points) attached below. The file size of the auto-generated PNG thumbnails of the SVGs is about a third smaller than the original PNG versions.

Other configurations
All tested candidates All tested candidates Higher-polling tested candidates Higher-polling tested candidates Lower-polling tested candidates Lower-polling tested candidates All declared candidates Lower-polling declared candidates

Individual candidates
Joe Biden Cory Booker Michael Bloomberg Sherrod Brown Steve Bullock Pete Buttigieg Julian Castro John Delaney Tulsi Gabbard Kirsten Gillibrand Kamala Harris John Hickenlooper Eric Holder Jay Inslee Amy Klobuchar Terry McAuliffe Beto O'Rourke Tim Ryan Bernie Sanders Eric Swalwell Elizabeth Warren Marianne Williamson Andrew Yang

Mélencron (talk) 03:56, 25 April 2019 (UTC)

I think the best way to go here would be for the article to have a graph of the higher polling candidates, including those who aren't candidates yet (Joe Biden), but a slightly different line or colour for pre- and post-campaign launch. Onetwothreeip (talk) 04:05, 25 April 2019 (UTC)

@Catiline52, MrX, and PaperKooper: for input again. As far as I can tell, there are currently three users in favor of using a graph including all candidates, while the others are agnostic, and you would rather only use a graph with the top-polling candidates. Mélencron (talk) 04:24, 25 April 2019 (UTC)

I would be as inclusive as possible when it comes to including candidates. It's just not necessary or wise to include all of them. When the bottom of the graph gets messy, that's the time to stop adding candidates into the graph. "All candidates" is now more than twenty, and at that point the bottom 3% of the graph more resembles multi-coloured spaghetti than easily observable information. Onetwothreeip (talk) 05:08, 25 April 2019 (UTC)

I don't have any strong opinions about the specifics. What's important to me is that we show the trends graphically and that the graphs are not so inclusive that they become unreadable. I would be fine with leaving out candidates who consistently poll low. - Mr X 🖋 17:08, 25 April 2019 (UTC)

After some modifications, I've gone with a single-graph solution with the others accessible on Commons; hope this proves satisfactory. Mélencron (talk) 23:28, 30 April 2019 (UTC)

That looks excellent, thanks! — JFG ^talk 08:26, 1 May 2019 (UTC)

Simplify post-Biden

Now that Biden has joined the race, there is no point separating "declared candidates" from "all high-polling candidates". I would further suggest limiting the graph to candidates who have polled at 3% or above since January 2019. The rest is just noise. — JFG ^talk 16:38, 25 April 2019 (UTC)

Also, I would start the charts in 2018, because the polls from 2016 and 2017 are not statistically significant (very few polls, total guesswork on candidates). — JFG ^talk 16:42, 25 April 2019 (UTC)

Sorry, I had not noticed that the proposed charts already start in 2019. Looks good to me as well. Nice charts, btw. — JFG ^talk 16:43, 25 April 2019 (UTC)

Polls that matter to the qualification of the first two debates

Could we highlight the polls that matter to the first two debates? —Wei4Green | 唯绿远大 (talk) 05:49, 5 May 2019 (UTC)

No, because it is not clear which of them are counted, and the DNC has declined to clarify. Mélencron (talk) 12:39, 5 May 2019 (UTC)

We should certainly make it clear which candidates have polled at least 1% in three national polls, but we can't make a determination as to who will be in the debates or what counts for qualification. Onetwothreeip (talk) 06:52, 7 May 2019 (UTC)

Rounding the 1%

@Mélencron: Regarding our recent back-and-forth edits on rounding methods, I believe your edit summaries deserve to be memorialized on the talk page.

JFG said: Inslee and Swalwell are under 0.5% (2 respondents out of 440), that rounds down to 0, not up to 1.[4]

Mélencron said: rv; can't believe I'm nitpicking over this, but it's not ~~something that's not~~ correct – the individual numbers of respondents displayed are the rounded *weighted* n's for each candidate, whereas the overall reported sample size is the *unweighted* sample size – the two aren't comparable, so the reported toplines are simply used/rounded (the actual *weighted* toplines differ slightly; e.g. Buttigieg is at 7.5002% and not ~~75.0%~~ 7.50% as (unweighted candidate n)/(weighted overall n) would suggest)[5] As a further demonstration of this point (to make it clear what I'm saying), note that the fact that all of the n's displayed in the tables are weighted mean that they don't add up to 440 in all cases – adding across rows/columns, for example, "race"/"age" sum to 439; adding every response option vertically adds to 437, etc.[6]

I still disagree but won't nitpick the nitpicking. — JFG ^talk 21:06, 6 May 2019 (UTC)

The crux of the issue is that you're (mistakenly) assuming that the numbers reported in the table are unweighted n's: they're not, and in general, we shouldn't attempt to recalculate the results of polls based on their raw sample data. The tables show rounded weighted sample sizes, so you have things like Michael Bennet appearing to have support from 5 respondents but the sum of every demographic group for him is 6 – since the weighted number of respondents for Bennet (not displayed on the page) is 5.43, not 5; likewise for other lower-polling candidates (1.48, not 1, for Hickenlooper; 2.25, not 2, for Inslee; Castro/Delaney/Gillibrand/Bennet all appear to have the same number of respondents, but the weighted n's range from 4.92 to 5.43). Mélencron (talk) 21:13, 6 May 2019 (UTC)

I understand, but all these black arts of weighting sound quite irrelevant when the methodological error margin is 5% anyway, due to the combination of a low population sample size and a large number of candidates to choose from. — JFG ^talk 12:49, 7 May 2019 (UTC)

Klobuchar vs. Booker

Amy Klobuchar appears to be doing as well as or better than Cory Booker in some of the more recent polls. I’d say they’re about neck and neck now, and that if he is included in the graph and chart, she should be too. 63.231.140.53 (talk) 19:28, 23 May 2019 (UTC)

Also the most recent CNN power ranking has her ahead of him: [7] 63.231.140.53 (talk) 19:51, 23 May 2019 (UTC)

Well, not quite, but if we're talking about Booker/Klobuchar, I'd be more inclined to consider removing one than adding one of them (and some punditry about "power rankings" shouldn't have bearing here). Also, between the 10 most recent pollsters, Klobuchar has only hit 3% in one, so again not inclined to add her. Mélencron (talk) 21:58, 23 May 2019 (UTC)

~~I think I would prefer removing both rather than keeping both but we can afford to wait.~~ Actually, removing both would seem to be untenable given how big the Other column would be. I think for now keeping one is the way to go, but further changes would probably necessitate the current table being split in two somewhere. Onetwothreeip (talk) 02:25, 24 May 2019 (UTC)

There's no such thing as "removing both", only one is currently in the table. My suggestion would just to create a new table should Booker fall more consistently to only 1–2% results as Klobuchar has for months, but we're not at that point yet. Mélencron (talk) 04:27, 24 May 2019 (UTC)

I'm just making the distinction between two ideas. I also want to make the point that we can split the table at any month, it doesn't have to be from when we decide to split it. I also think we shouldn't regard anything from a "CNN power ranking" or anything like this. We should remain outside media commentary as much as possible. Onetwothreeip (talk) 05:05, 24 May 2019 (UTC)

Agree to status quo. Revisit when Klobuchar breaks 3% consistently or when Booker goes down to 2% in ten polls. — JFG ^talk 06:31, 24 May 2019 (UTC)

Deletion of "Statewide opinion polling for the 2020 Democratic Party presidential primaries"

Why got the "Statewide opinion polling for the 2020 Democratic Party presidential primaries" article deleted? - Ich bin es einfach (talk) 22:05, 23 June 2019 (UTC)

why was the article deleted.Alhanuty (talk) 04:27, 24 June 2019 (UTC)

Statewide opinion polling for the 2020 Democratic Party presidential primaries. --Comment by Selfie City (talk about my contributions) 22:09, 26 June 2019 (UTC)

[1] Sherrod Brown and Amy Klobuchar with 2%; Tulsi Gabbard, John Hickenlooper, and John Kerry with 1%; Steve Bullock, Pete Buttigieg, Julian Castro, John Delaney, Eric Garcetti, Kirsten Gillibrand, Eric Holder, Jay Inslee, Terry McAuliffe, and Gavin Newsom with 0%; other with 2%

[2] Amy Klobuchar with 2%; Sherrod Brown, Julian Castro, Tulsi Gabbard, Kirsten Gillibrand, John Hickenlooper, Eric Holder, and Andrew Yang with 1%; John Delaney, Jay Inslee, and Terry McAuliffe with <1%; Pete Buttigieg with 0%; other with 1%

[3] Julian Castro with 8%; Sherrod Brown with 4%; John Delaney and Tulsi Gabbard with 2%; Kirsten Gillibrand, Amy Klobuchar, and Richard Ojeda with 1%; other with 6%

[4] Sherrod Brown and Kirsten Gillibrand with 2%; John Delaney with 1%; Julian Castro, Tulsi Gabbard, and Terry McAuliffe with 0%

[5] Tulsi Gabbard and Kirsten Gillibrand with 2%; Julian Castro with 1%; Michael Avenatti with 0%; other with 3%

[a]

[b]

[c]

[d]

[e]