Talk:COVID-19 pandemic in the United States/Archive 13

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 10 Archive 11 Archive 12 Archive 13 Archive 14 Archive 15 Archive 17

Semi-protected edit request on 3 June 2020

The page has not yet updated the reported cases for Massachusetts as of June 2, 2020. The page currently states the daily cases for that day as 0, when it hit over 300. This is causing some issues on social media of people believing there were 0 reported cases yesterday, despite the numerous other sites. 2601:18F:8101:C1D0:1C89:7D39:2EE1:B8D8 (talk) 15:48, 3 June 2020 (UTC)

 Not done: please provide reliable sources that support the change you want to be made. —Tenryuu 🐲 ( 💬 • 📝 ) 16:39, 3 June 2020 (UTC)
Also, this page doesn't cover states in as much detail as their own pages, like COVID-19 pandemic in Massachusetts, and I see a value has been added there which does not match the one you are claiming. —Tenryuu 🐲 ( 💬 • 📝 ) 16:41, 3 June 2020 (UTC)
I just copied into the state graphs the 4pm updates from Covid Tracking Project, so it's something a little more accurate. For Mass there was a big jump on Tuesday Jun 2 (IIRC), I think it may be tied to a change in case accounting, but not certain Scotty.tiberius (talk) 18:47, 5 June 2020 (UTC)

COVID-19 data tracking platform: Department of Health and Human Services COVID-19 Protect Now

Department of Health and Human Services COVID-19 Protect Now became operational on April 10:

—:— T3g5JZ50GLq (talk) 04:38, 6 June 2020 (UTC)

Semi-protected edit request on 11 June 2020

Changes for North Carolina in the "COVID-19 pandemic in the United States by state and territory" Table under "Timeline"

Change cases in North Carolina from 30,777 to 38,171 per NCDHHS numbers reported here: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_North_Carolina

Change recovered in North Carolina from 18,860 to 27,172 per NCDHHS numbers reported here: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_North_Carolina

Change deaths in North Carolina from 969 to 1,105 per NCDHHS numbers reported here: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_North_Carolina

Changes are requested in order to reflect the most recent data available, especially as Google trackers appear to be scraping this table. Other states may be in similar circumstances. If it is possible to have the numbers be linked back to the NC page that may be more efficient but I am not familiar enough with the interface to know for certain if that is an option. Epdoughty (talk) 14:03, 11 June 2020 (UTC)

Graphs for states with < 50,000 cases

In Number of U.S. cases by date, I think it would be useful to have graphs showing number of cases for each state. I don't know what workload that would involve, if it's a script generating them or someone doing by hand. Right now, the graph for > 100,000 cases has 5, and 50,000-100,000 has 4. So, that's 41 more states which could certainly be difficult to handle. Perhaps just showing hotspot states would be useful. Like, I think it would be useful to be able to see Arizona's graph, granted they'll be over 50,000 in 10 days if even 3.5% growth happens. I'm sure there's other states that would be useful to see. Darlingm (talk) 04:08, 16 June 2020 (UTC)

"American Corona" listed at Redirects for discussion

Information icon A discussion is taking place to address the redirect American Corona. The discussion will occur at Wikipedia:Redirects for discussion/Log/2020 June 17#American Corona until a consensus is reached, and readers of this page are welcome to contribute to the discussion. Pandakekok9 (talk) 08:34, 17 June 2020 (UTC)

Addition of death toll surpassing World War I

I'm still not autoconfirmed, can someone add there in the section which compares the death toll with the one of Vietnam War and the Korean war, that the death toll has surpassed in June that of World War I (116,516)?

That sounds like a good idea 2600:1702:2340:9470:C0F1:424C:6BCB:AA03 (talk) 02:43, 17 June 2020 (UTC)
That would be nice, and I believe that data to be generally correct, but do we have a source indicating the comparison? Can we simply cite two which indicate each number individually? wobblewatch (talk) 23:37, 17 June 2020 (UTC)
I've added the new information. Sources from Time Magazine, Johns Hopkins, and The Week have been included. wobblewatch (talk) 23:55, 17 June 2020 (UTC)

Americans are gargling with bleach and drinking household cleaners

See here: "A new Centers for Disease Control report indicates Americans are putting household disinfectants including bleach into their bodies because they believe such practices can ward off coronavirus". Count Iblis (talk) 15:46, 7 June 2020 (UTC)

"4% of those surveyed admitted they’d drank or gargled household agents including bleach. A staggering 18% of participants confessed they’d applied cleaning agents to their skin. Nearly 10% inhaled fumes from potentially toxic household disinfectants." Poor guys, they trust their President so much [9]! I guess a much better approach would be to wash your throat with listerine rather than lysol. Still do not drink it. My very best wishes (talk) 21:46, 11 June 2020 (UTC)
Listerine is going for 8$ a bottle here...there is no bleach..rubbing alcohol or gloves 2600:1702:2340:9470:7103:703D:723F:1DC8 (talk) 22:33, 13 June 2020 (UTC)
One can only wonder is it an anti-science bias [10] or something for psychologist. Why would anyone felt himself so dirty to use house cleaner as a remedy [11]? My very best wishes (talk) 03:54, 19 June 2020 (UTC)
I can`t believe anyone would do it..I have a very strong bias against trump but I can`t believe 4 out of a hundred people would do that 2600:1702:2340:9470:2CA0:429A:C82E:3C6A (talk) 17:42, 19 June 2020 (UTC)
It isn’t about President Trump, this was happening before any remark from him or any magnified 100x by media / comedian distortions. A couple of months of sudden common presence and caution is lost, so accidents and experiments happened. Many folks, not knowing better, wind up in the same place. Cheers Markbassett (talk) 02:42, 20 June 2020 (UTC)
Do you believe that trump`s statements did or did not encourage even one person to gargle with beach ? yes or no comrade 2600:1702:2340:9470:5495:F7A9:2143:A90B (talk) 01:20, 21 June 2020 (UTC)

Should this be added in the article (with the sources)

The USA is the ONLY nation out of the top 20 to report daily and weekly increases in new infections in the whole world, we're entering the second wave in 34 states+DC that reopened too early and 9 states that never locked down. Only NY state-NJ-CT in the northeast and west coast states WA-OR-AK-HI had relatively didn't have this. It's uncertain CA which has entered stage 3 in their state reopening process (stage 2 in May) had an uptick despite a reduction of social distancing in Memorial Day gatherings nationwide, but the southern border cities have their hospitals and testing sites full of people from nearby Mexico (Tijuana across from San Diego and Mexicali near Imperial), this should be true for AZ, NM and TX as well (the US has more daily cases now than all of Europe combined). 2605:E000:100D:C571:4C1D:EB7D:B365:D7B4 (talk) 00:54, 14 June 2020 (UTC)

Yes it should be in the article 2600:1702:2340:9470:7103:703D:723F:1DC8 (talk) 04:57, 14 June 2020 (UTC)
No, this is too confused and without text or cites - but just too many unknowns and other views exist. BBC notes cases overall not falling US-wide because declines in initial hotspots is offset by increases in initially untouched regions, though oddly deaths are trending downwards. Dr. Fauci regards it all as still the First wave playing out. Cheers Markbassett (talk) 02:51, 20 June 2020 (UTC)
and yet we just had a spike..you lost me at 34 states+ but it`s relevant.. condense it..make it readable...include it 2600:1702:2340:9470:5495:F7A9:2143:A90B (talk) 05:45, 21 June 2020 (UTC)

Semi-protected edit request on 22 June 2020

United States 1 case active 204.63.137.150 (talk) 00:48, 22 June 2020 (UTC)

 Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format and provide a reliable source if appropriate. Username6892 01:01, 22 June 2020 (UTC)

Semi-protected edit request on 22 June 2020

i know things Oblivion1612 (talk) 13:30, 22 June 2020 (UTC)

 Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format and provide a reliable source if appropriate. Mdaniels5757 (talk) 18:24, 22 June 2020 (UTC)

Charts II

I see my previous comments have been removed, not sure why. I've two comments. To reiterate my previous comment, the logarithmic graph is useless, imho, and appears to be someone's pet project. I suggest it be removed because it doesn't convey a useful amount of information (after the initial phase where the % increase was high). The second comment is the covid-19 cases in the united states bar chart should 1) remove recoveries, because they are NOT reported by every state and pretending that the number of "recoveries" is as valid as the number of cases and deaths is misleading, 2) should report number of cases & deaths on a given date and not %increase. I doubt most people looking at the chart know that 1% growth is exponential growth plus as that number jumps around, it is hard translate from the percentage to 'see' the trend and finally 3) the chart should include at least the last 60 days, limiting it to 15 days seems to be intended to obfuscate and not illuminate.174.130.70.61 (talk) 20:11, 16 June 2020 (UTC)

174.130.70.61, for your last point, you can change the chart to see any individual month or multiple months by clicking on them at the top. JoelleJay (talk) 20:22, 22 June 2020 (UTC)

Graph for 50,000-100,000 cases

As it stands, the graph poses color coding difficulties to those with common color vision deficiencies. i.e. Florida and Virginia's lines being in a zone of confusion for what is termed "red green colorblindness" Pennsylvania and North Carolina's lines are in a zone of confusion for less common types. Jidosha (talk) 21:15, 22 June 2020 (UTC)

Overuse of Trump photos

This article currently contains seven different photos/videos of Donald Trump, plus several more of Mike Pence and other administration officials. A few Trump photos are probably warranted, but seven is way overkill. Can we swap some of them out for examples that better illustrate aspects of the pandemic besides just the Trump administration's response to it? {{u|Sdkb}}talk 00:33, 28 April 2020 (UTC)

I agree. buidhe 05:04, 29 April 2020 (UTC)
Fewer pics of Trump, please. We're suffering enough already... ---Another Believer (Talk) 15:21, 1 May 2020 (UTC)
Yikes RopeTricks (talk) 10:23, 15 May 2020 (UTC)
Agreed, article could use a wider range of photos instead of multiple photos of politicians. •Shawnqual• 📚 • 💭 17:50, 22 May 2020 (UTC)
As of June 8 this page still looks like a White House press release from the way it is illustrated. Per the above comments I reduced the number of Trump shots from 8 to 3, all in the President Trump section, and replaced some of them by equivalent pictures when available. Place Clichy (talk) 06:48, 8 June 2020 (UTC)
You missed the Trump photo in the COVID-19 pandemic in the United States#Drug therapy and vaccine development section. JRSpriggs (talk) 14:35, 8 June 2020 (UTC)

I make suggestion since I don't know how to work with the graphics - it seems to be an oversight that neither the phrase "flatten the curve" or the widely circulated graphics illustrating that concept appear in the article. I'm not very proficient here, having only made little data edits or I'd try to do it. Can't even figure out how to sign this.

Semi-protected edit request on 19 June 2020

Add Rich and Beaver counties upon verification (per Utah article) 2600:2B00:7615:CF00:ED1A:AF92:DDD3:8AAF (talk) 17:57, 19 June 2020 (UTC)

 Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format and provide a reliable source if appropriate. — Tartan357  (Talk) 00:08, 23 June 2020 (UTC)

Graphs for states with > 50,000 cases

Shouldn't Georgia, Maryland, and Virginia be added to the chart? And probably soon also Arizona, Connecticut, Louisiana, North Carolina, Ohio... --Qumranhöhle (talk) 12:26, 18 June 2020 (UTC)

And what happened to Washington State? It was on there the first couple of months, but is not now. With the people camping for several blocks in that state, there is general interest in the effects there. --174.130.3.180 (talk) 06:01, 26 June 2020 (UTC)

Questionable value of case counts

This sentence in the first paragraph of the lead may be misleading:

As of May 27, 2020, the U.S. had the most confirmed active cases and deaths in the world.

The reason for that opinion is because as of today:

  • The U.S. per capita death rate is seventh in the world, with the top six all in Europe. It's less than half the rate of Belgium and a bit more than half of the UKs.
  • According to the WHO, total deaths worldwide are 470,192 and cases total 9,033,951. That puts the average worldwide fatality rate at 5.20% (deaths/cases).
  • The global chart of fatality rates by country shows that for the U.S. it is 5.26%. From the same chart, it can be seen that of the 188 countries listed, the U.S. has the closest fatality rate to the world average.
  • The average fatality rate for the top 5 EU countries is 14.69%. That's 280% more than the U.S.
Addendum: To clarify a bit, the EU countries referred to does not include Germany. While Germany has slightly larger (1.3 times) population than France, they both have similar demographics and both have similar total cases. Yet Germany has less than a third the fatality rate (4.64%) of France (15.03%) and less than a third the total deaths of France. Light show (talk) 17:22, 23 June 2020 (UTC)
  • Assuming that the population demographics for those EU countries is similar to the U.S., it would imply that those EU countries are giving 280% fewer tests per capita than the U.S. For example, for the UK to have the WHO average fatality rate, they would have 812,000 cases instead of the current 306,000.
  • If those figures are correct, then it might help explain why the per capita death rates are strikingly higher in the EU than the rest of the world. For instance, if those EU countries have chosen to reduce their testing to only about a third of the testing rate done by the U.S., then it implies many more people are dying there without ever having been tested.
Addendum: Came across some recent reports related to this:
  • Many of the major EU countries [12] will test only those with symptoms, whereas in the U.S. and Canada, anyone who wants a test can get one, with or without symptoms (asymptomatic). Those might be people who live with or have been around others they think may be infected.
  • A number of reports have found that about half of people who tested positive in random samples, had no symptoms.
  • In Indiana, they estimated that only about 1 of every 11 Covid infections were identified after testing only those in either high-risk groups or with symptoms. According to their lead expert, "What we knew through conventional detection methods -- testing symptomatic people and those at high-risk for COVID-19 -- was just the tip of the iceberg." --Light show (talk) 21:39, 23 June 2020 (UTC)
  • The chart also shows that the wide range of fatality rates (deaths/cases) is dramatic enough to be meaningless: ie. France, 15%, and Russia, 1.4%. The world average is 5.26%. Recall, for instance, that in Europe a few months ago they only tested people who made it to the hospital with severe symptoms. Deaths at home or in senior care centers were often excluded. The WHO never set a global standard for all countries so comparisons are of little value.

So getting back to whether that sentence up top may be misleading, it seems to be. The U.S. apparently now gives a much larger ratio of tests to the population, ranks 38th in the world for its case fatality rate, and 7th in its per capita death rate. Therefore, a lead sentence which states that the U.S. has the "most cases and deaths in the world," without some context, should best be deleted. --Light show (talk) 05:40, 23 June 2020 (UTC)

It is misleading as it was April 11th when US deaths overtook Italian.[1] If you are afraid that people don't understand the difference between per capita and absolute terms, well, there is simple wikipedia for those with difficulties understanding the language. Of 19 (talk) 02:28, 25 June 2020 (UTC)
The CDC now estimates 23 million Americans or above 5% unknowingly had COVID-19 in the first half of the year pandemic (Dec 2019-Jun 2020). https://www.aol.com/article/news/2020/06/25/cdc-says-covid-19-cases-in-us-may-be-10-times-higher-than-reported/24537120/ They estimated up to 13% of the US population or up to 40 million before, then again I can't imagine 10% of the US population may currently be walking around with the virus without symptoms nor were able to get tested. 2605:E000:100D:C571:A8BB:CE5:5FFF:7B6A (talk) 21:21, 25 June 2020 (UTC)
Good question about ability to get tested. Although things have changed since February. As of the last two months, the U.S. has been testing about double the ratio of its population than all EU countries except Denmark, per reliable sources. Not sure the connection, but the six countries in Europe that have a much higher per capita death rate also test only about half the percent of its population than the U.S.
So another new question is why the EU would now even consider banning travelers from the U.S.. While they say it's because the U.S. number of coronavirus cases per 100,000 is higher, that seems like a strange rationale, considering it tests twice as much per capita. If the U.S. tested at the average that Europe does, then the U.S. would have under half as many cases. Then add the fact that for most of the countries in Europe (excluding Germany,) the likelihood of dying (fatality rate) once infected is two to three times greater, wouldn't it be more logical for the U.S. to ban travelers from Europe? Is there something missing here to explain the EU's rationale?--Light show (talk) 18:48, 27 June 2020 (UTC)

References

Graphs of new cases by state

I was discouraged by a lack of Graphs of new cases by state, but have found a way to see those Graphs from the data in the encyclopedia:

  1. Do a Google search: for example, Covid-19 map
  2. Scroll down in the Google search results, to the Daily change heading
    • For example: New cases § United States § New York § All time
      • You can select from drop-down menus to suit your question
  3. Read your particular state's graph, which is what I was seeking
    • The data for the graphs is from Wikipedia, according to the Google annotation

So Thank You to the editors who are faithfully entering the daily data; it is getting into the graphs. --Ancheta Wis   (talk | contribs) 01:59, 30 June 2020 (UTC)

Meaning of the dashed line on the logarithmic deaths chart

The article currently contains a graph which plots the number of cases and the number of deaths due to COVID-19. The caption reads as the following:

"Number of cases (blue) and number of deaths (red) on a logarithmic scale"

There is, however, an additional dashed line on the plot which is not explained in the caption. It appears to be describing deaths but actually has the total number of dead decreasing in the months of May and June, which doesn't make sense for obvious reasons.

The dashed line should either be explained in the chart caption, or (if it is a plotting artifact) be removed to avoid confusion. 2600:1700:68D0:6F10:2C30:54AE:5B3E:46AF (talk) 14:44, 27 June 2020 (UTC)

If you click on the graph and then click on the 'More Datails' button, you can read a more detailed caption. The dotted black line represents the number of reported deaths over the last ten days. — Preceding unsigned comment added by 184.162.102.86 (talk) 17:13, 3 July 2020 (UTC)
Could I add that I think the 'lines of best fit' should be removed since they are not helpful, may cause people to draw incorrect conclusions about likely future trends, and arguably constitute original research? Perokema (talk) 18:05, 27 June 2020 (UTC)
Agreed. pauli133 (talk) 12:09, 5 July 2020 (UTC)

Temporal "benchmarks on the Progression et. al charts under statistics

Any chance of bringing back more frequent (like weekly) dates on the X-axis on those charts? The notations on the X-axis need to be corrected, anyway. If you count the daily dots, starting from "July," that last dot in those curves is for July 21 or 22 .... TrilliumLady (talk) 05:19, 17 July 2020 (UTC)

Semi log plot hiding the true situation

Could I suggest that the semi-log plot near the top of the page is of little value at the moment, as the marked changes in infection rates are resulting in small changes when plotted this way. These graphs are only of real use to people used to looking at semi-log plots, which is a reasonably small percentage of the population. I would much prefer to see the number of cases vs time plot (on linear scales) shown near the end of the page here, as it would bring home to the casual reader what the true situation is. Ditto the number of deaths vs time plot. — Preceding unsigned comment added by 202.172.113.133 (talk) 06:43, 10 July 2020 (UTC)

Semi log plot is useful but the above is right in that normal plot would be useful as well. The problem is with the graphical representation in Timeline section, which has days going vertically down instead of giving a normal plot with time on x-axis; and one day takes up a whole line of printed text. That representation further gives changes in cases as percent agains previous day, which is misleading since that is not adjusted for changes in daily tests; in fact, the semilog plot is misleading in the same regard: it suggests exponential growth trends in cases (as straight lines on the semilog plot) which are not adjusted for growth of daily tests and therefore misrepresent the true growth rates, typically by depicting higher growth than actual as long as daily tests are increasing.
The plot sought by the above anon seems to be one similar to what is given at https://www.worldometers.info/coronavirus/country/us/, "Daily New Cases in the United States", but here again beware that this is not adjusted for growth of daily tests and therefore is misleading. For plots that do not suffer from this misleading effect, see links at #Daily new cases as percentage of tests AKA test positivity rate above. --Dan Polansky (talk) 10:10, 11 July 2020 (UTC)

My biggest complaint with that chart is the *unlabeled* % increase (or at least I think that's what it is, because the chart doesn't say!) It's misleading. 2% of a 3,429,626 case load is way different than 2% of say a 1,005,522 case load. That % makes the daily increase look minuscule, and it's not. It's at record levels. I believe it was JHU that listed todays increase as 81,000 cases. WorldOmetsers lists the days increase at 73,000-and-change. TrilliumLady (talk) 05:27, 17 July 2020 (UTC)

Daily new cases as percentage of tests AKA test positivity rate

Daily new cases as percentage of tests is a very relevant figure to look at.

Charts of it on a per U.S. state basis are available here:

A chart on the U.S. level is available here:

I could create a plot for Wikipedia, using data from https://covidtracking.com/data/download. However, they have some curious data license here: https://covidtracking.com/about-data/license. I have no idea what kind of thing a "data license" could be, in terms of U.S. law; data is not copyrightable as far as I know. Furthermore, U.S. has no database protection law. Could someone clarify whether that "data license" has any force at all? Could we use the data to create a plot? Alternatively, is there is different data source for U.S. test counts that does not confuse the user with a "data license"? --Dan Polansky (talk) 17:18, 8 July 2020 (UTC)

Charts for various countries including U.S. here:

--Dan Polansky (talk) 18:04, 8 July 2020 (UTC)

A long page at OWID on testing:

--Dan Polansky (talk) 10:00, 16 July 2020 (UTC)

Test positivity rate for US, calculated from Our World in Data, from owid-covid-data.csv[13], smoothed via 7-day moving average:

--Dan Polansky (talk) 14:44, 16 July 2020 (UTC)

Awk oneliner for Windows to calculate the above rate from new cases and new tests:

awk -F, "$1==\"USA\" && $14>0 {printf \"%s %.1f\n\", $4, 100*($6/$14)}" owid-covid-data.csv

By changing the above filter from USA to something else, you can get data for a different country. --Dan Polansky (talk) 18:48, 16 July 2020 (UTC)

Test positivity rate in %, top 7 U.S. states, July 15, 7-day moving average, from coronavirus.jhu.edu[14]:

--Dan Polansky (talk) 09:06, 17 July 2020 (UTC)

Political neutrality and slant.

Statements in the 2nd opening paragraph:

"The initial U.S. response to the pandemic was otherwise slow, in terms of preparing the healthcare system, stopping other travel, or testing for the virus.[17][18][19][a]"

"Meanwhile, President Donald Trump was downplaying the threat posed by the virus and claiming the outbreak was under control.[23][24]"

has negative connotations from an op-ed of quoted source and arguably not facts but subjective observations.

Recommend revision or elimination. — Preceding unsigned comment added by Hughchow (talkcontribs) 21:20, 20 July 2020 (UTC)

Statnews article

I dropped a paragraph tracing to the following article:

The above is not peer-reviewed science. Alone the introductory image showing trucks used as temporary morgues is something a scientific article would not do. One defect of the above article is that it compares a small number of countries and does not indicate the selection criteria for choosing those countries. And there is no control for confounding factors; what if the black population of New York is biologically more vulnerable to the virus; what if there are more obese people in New York; what if Germany has top-notch healthcare including high number of ICU beds per 100 000 pop; and so on. If one chooses large enough selection of countries, that may create something like control for confounding factors even if you do not know what they are, but choosing 5 countries to compare while not even mentioning the possibility of confounding factors is not science, and not sound statistical analysis.

For reference, credentials of the article authors: 'Isaac Sebenius graduated from Harvard College in May 2020 with a degree in molecular and cellular biology. James K. Sebenius is a professor of business administration at Harvard Business School and director of the Harvard Negotiation Project based at Harvard Law School.'

--Dan Polansky (talk) 06:54, 21 July 2020 (UTC)

Statistics section: overestimation of death numbers

I added an important aspect from COVID-19 pandemic#Deaths (after checking the sources, of course): reasons for overestimating the number of deaths. [15] relates to Italy. I know many similar sources for Germany, but at present don't have the time to look for a new source refering specifically to the U.S. Could you help?

"Excess mortality comparing deaths for all causes versus the seasonal average is more reliable" [16]. It would be great to have such a statistics in the article. In Europe, we have euromomo, but again I don't know a similar American institution. --Jwollbold (talk) 13:05, 20 July 2020 (UTC)

You seem to be asking for sources of excess deaths for the U.S. Here are some:
  • Excess Deaths Associated with COVID-19, www.cdc.gov - has blue-bar all-cause death graphs for the whole U.S. and also for U.S. states via "Select a jurisdiction" and also specifically for New York City, assuming dashboard Weekly Excess Deaths was selected
  • The Human Mortality Database, mortality.org - data for more countries than EuroMOMO and has death count rather than z-index; here is graphs
Beware that especially last two weeks can be badly affected by registration delays, usually show much lower deaths than actual, and need to be either disregarded or statistically corrected. --Dan Polansky (talk) 06:15, 21 July 2020 (UTC)
Thanks, Dan Polansky - I added the first source. Regarding the second, one should check which statistics is most meaningful.
My first idea was to have an excess mortality chart in the Wikipedia article itself. Do you think this would be helpful? But I couldn't realize this myself, technically. --Jwollbold (talk) 20:01, 21 July 2020 (UTC)
Jwollbold, it should be possible to screenshot the cdc.gov chart, upload it as png to commons, and include in the article; from what I understand, it is in public domain. And it would be a fine idea, I think. As an alternative, one could grab the actual data from mortality.org and make a plot using the graphing add-in. --Dan Polansky (talk) 20:22, 21 July 2020 (UTC)

Redundancy between the statistics section and Measuring case and mortality rates

Could someone more familiar with the article than myself check, reduce and/or extend both sections? First, I observed that in "...mortality rates" reasons for overestimating the number of deaths are missing. Or should the respective last paragraph be moved to the statistics section? --Jwollbold (talk) 20:01, 21 July 2020 (UTC)

Need to cite sources, esp. in "Number of U.S. cases by date" and "Progression charts" sections

I have searched in vain for sourcing for the critical cases/deaths/etc charts in these two (sub)sections. Specific inline citations with live links should be provided for each chart. Can someone please add them, for WP:Verifiability? —RCraig09 (talk) 22:37, 14 July 2020 (UTC)

Sources:
But you are right, there are no sources indicated above or under the charts you mentioned, and people who update the charts should ideally fix that. A plain link to the source like this[17] would go a long way to help. --Dan Polansky (talk) 10:05, 16 July 2020 (UTC)
@Rider0101: (I saw you made updates of progression charts): What are the sources for the data in the progression charts? --Dan Polansky (talk) 08:22, 18 July 2020 (UTC)
There is Template:COVID-19 pandemic data/United States medical cases, and it has sources indicated in the table row "State Sources", on a per state basis. Perhaps the US level aggregates are taken from this table. --Dan Polansky (talk) 10:20, 23 July 2020 (UTC)

Dropping daily recoveries

I would be happy to drop the chart with daily recoveries. The data seems to be more noise than signal, and is not very important, I think. What would be important are daily hospitalizations, but we do not have that in the article. --Dan Polansky (talk) 08:55, 21 July 2020 (UTC)

I added weekly hospitalizations. What are the daily recoveries good for? Maybe I just lack the proper imagination for daily recoveries. Is anyone using them to calculate daily current cases (not new cases) and is that meaningfully reliable? --Dan Polansky (talk) 18:47, 21 July 2020 (UTC)
Notionally, comparing graphs of daily recoveries to daily new cases and daily deaths lets one build a picture how the treatment period changes over time, but the data for daily recoveries does not look to be of very high quality, which... limits its value. pauli133 (talk) 15:58, 23 July 2020 (UTC)
pauli133, do we at least know where that data is sourced from? --Dan Polansky (talk) 17:41, 23 July 2020 (UTC)
There's some handwaving about covidtracking and worldometers, but no actual reference, and I don't immediately see this data at either site. Whoever is adding it gets the numbers from SOMEWHERE, but if that's the CDC or random.org, I'm not yet sure. pauli133 (talk) 18:03, 23 July 2020 (UTC)

Recovered cases in the United States

Hi, I would like to ask why there is a difference in the number of recovered patients in the United States between the one from the bar graph and the other from the epidemology overview chart. — Preceding unsigned comment added by 42.60.88.59 (talkcontribs) 15:58, 15 June 2020 (UTC)

Statistics and weekly periodicity?

The plots showing the number of new COVID-19 cases and the number of deaths show a fairly clear periodicity with a ~7 day period. If there is a reason for this, I think mentioning it in the article would be useful. Is it something to do with when people get tested and their weekly schedule? When they are more likely to interact with others (e.g. on weekends) or something else? — Preceding unsigned comment added by Fcrary (talkcontribs) 02:09, 27 June 2020 (UTC)

New report "Tracking COVID-19 in the United States"

Available here, with a Guardian article about it here. -- Daniel Mietchen (talk) 15:18, 24 July 2020 (UTC)

Chart of active cases, compared to other countries

US has the highest number of active cases within the world. Would it be useful to add a chart, comparing its trend with other most affected countries? --Traut (talk) 09:19, 18 July 2020 (UTC)

COVID-19 Active Cases per 100 000 population

The above is misleading in so far as it disregards growth in number of tests; the misleading effect is particularly pronounced for Sweden and the US; I don't know about Brazil. Comparing test positivity rates would be more useful. The above chart does give some idea and is not completely useless, but one has to keep in mind that there was significant growth in test counts, and it is really hard to keep that in mind for most readers. --Dan Polansky (talk) 11:56, 18 July 2020 (UTC)

(outdent) Below, let me replot the chart that gives a very different picture for the U.S:

Test positivity rate for US, calculated from Our World in Data, from owid-covid-data.csv[18], smoothed via 7-day moving average:

--Dan Polansky (talk) 12:09, 18 July 2020 (UTC)

Of course, the more you test, the more cases you will find. But there's already a graph, including tests and cases. The suggested graph here is how US does, compared to other countries with current or former major outbreaks. Highest test rates you may find in UAE (many cases), high test rates in Bahrain (many, many cases), lower test rates in Qatar or Chile (many, many, many cases) - and how would you like to compare the quality and amount of testing between US and Brasil? So you will need a certain simplification when you do compare US with other countries. Here's another comparison of cases and deaths - selected by countries with most cases. Qatar with 3750 cases/100 000 population, Bahrain with 2170 cases/100k or Chile with 1750 cases/100k are beyond the grid. But deaths in Qatar and Bahrain are surprisingly low. I don' trust the data from worldometers.info very much. But for a comparison of tests per population, their numbers work sufficiently well. See the charts below - and sorry, for this quick comparison I did not bother to select the same chart colors --Traut (talk) 12:51, 18 July 2020 (UTC)
COVID-19 cases per 100 000 population from countries with the most cases
COVID-19 cases per 100 000 population from countries with the most cases
COVID-19 deaths per 100 000 population from countries with the most cases
COVID-19 deaths per 100 000 population from countries with the most cases
I believe an intercountry comparison of daily test positivity rates is much more meaningful than an intercountry comparison of daily case counts. Such a comparison is meaningful even if test regimes are vastly different. In case of doubt, we should refrain from publishing country comparison charts; we should take pains not to mislead. Let me emphasize that I am not talking of test rates; I am talking of test positivity rates, that is, the ratio of daily new cases to daily new tests. worldometers.info data is probably generally okay, but they have the grave defect that it does not show daily test count alongside daily case count, as far as I know; but I do not see what worldometers.info has to do with the present proposal. --Dan Polansky (talk) 13:05, 18 July 2020 (UTC)
As for intercountry comparison of covid-coded deaths, that suffers from different covid-coding between countries and different testing regimes; better compare excess deaths, available for US on US level and state leven and for many European countries. --Dan Polansky (talk) 13:11, 18 July 2020 (UTC)
So obviously you care about test rates - but I don't. It was your suggestion to talk about tests, I didn't. But you can go to [worldometers] and sort the table by Tests/1M pop and find out about the total number of tests per country. This may indicate why some countries have high case numbers, because they have high test numbers. Take e.g. Gulf Daily News who claim 40% of tested people in Bahrain .[19]. But since you do not have exact and comparable data for those countries, you have to take what you got. The assumption of active case numbers is already a simplification, because hardly any country does test for recovered people. Here in my chart the assumption is that people are healthy 14 days after test, if they did not die. Official quarantine recommendations are down to 10 days instead by now. Once again: you can't compare countries by test numbers, because you do not have that kind of information. But you have official case counts (mine are from ECDC) and reasonable population counts (2018-12-31). --Traut (talk) 13:26, 18 July 2020 (UTC)
Worldometers does not plot daily test counts. For instance, on page https://www.worldometers.info/coronavirus/country/us/ it plots daily case counts but it does not plot daily test counts, so the page is gravely misleading.
Plotting daily case counts between counties while not accounting for growth of test rates is obviously misleading, and this issue cannot be addressed by saying "I don't care about test counts". Since, it is not about what particular editors care about; it is about what is a fair representation and comparison and what is misleading the reader.
The reader can obtain intercountry comparisons of test positivity rates at https://ourworldindata.org/grapher/positive-rate-daily-smoothed?tab=chart; there is a default list of countries but more countries can be added by clicking on "Add country". --Dan Polansky (talk) 13:43, 18 July 2020 (UTC)
And the above chart that plots deaths for countries with most cases is misleading by choice of countries: why not show top countries by deaths per 100 000 pop? This way, Sweden artificially looks the worst, which it isn't. The reader can get a more relevant picture at File:COVID-19 Outbreak World Map Total Deaths per Capita.svg. For an accurate and relevant picture of Sweden, the reader is well advised to consult the charts at COVID-19 pandemic in Sweden#Additional data, charts and tables. --Dan Polansky (talk) 14:17, 18 July 2020 (UTC)
That's because the second chart does show the same countries as the first chart. It is not supposed to show the countries with the most deaths, but the deaths of the countries with the most cases. But again, the main question here is how to compare the number of active cases per capita for US vs. other highly affected countries. --Traut (talk) 14:59, 18 July 2020 (UTC)
Active cases per capita is 1) not reliably known, 2) distorted by different testing regimens, 3) not particularly relevant; better drop the comparison. Daily progression of test positivity rate is known for many countries, and is relevant--Dan Polansky (talk) 15:06, 18 July 2020 (UTC)
As for "deaths of the countries with the most cases", I don't see how this is relevant. --Dan Polansky (talk) 15:07, 18 July 2020 (UTC)

Thanks for the feedback, Dan, you made your point, but you have a very different idea of what you want. Now I'd like feedback from other people who understand my idea to compare how US performs, compared to other countries. --Traut (talk) 15:45, 18 July 2020 (UTC)

Sure. For others: please read the above discussion and think of the intellectual duty of not misleading the reader. From what I can see, my substantive objections were not addressed above; rather, it all became very subjective, such as "I don't care" or "you have a different idea"; it should have been "X is fair enough", "Y is misleading", and the like. --Dan Polansky (talk) 15:59, 18 July 2020 (UTC)

I happened across this discussion, saw additional opinions had been sought and decided to voice mine. If I ought not to have because of my lack of understanding or such, I hope you'll forgive my temerity. Addtionally, on reading the discussion, I recognized Traut's name, and, in case it's perceived as bias, I'm declaring that I contributed to a discussion in which Traut proposed similar graphs for another article, and broadly supported that point. However, I did not in any way come to this page because of Traut: I was looking at the article, then the talk page, then this discussion, and only then did I recognize Traut's name.

As to be expected from a third voice, I agree with some points from each commenter. As I see it, the discussion has the following main points: 1, is some sort of comparison graph useful, and 2, if so, what graph would best maximize readers' comprehension and minimize their being misled.

On the first topic, I wholeheartedly agree that a comparison graph is useful. On oft-mentioned concept in the communication of mathematics is whether a number is large. Without such a comparison a reader cannot easily tell whether the number of cases is large because of a relatively high number of infections or simply because the USA is a populous country.

On the second topic, I agree with Dan Polansky that the better statistic to chart is excess deaths, presumably per 100,000 people or similar. I think Dan Polansky is right about high testing rates misleading a reader but wrong about test positivity being a useful statistic. On positivity, as I understand it, even two countries with similar populations and numbers of infected might have completely different positivity rates just because one tests many more people than the other, so I don't see that as an illuminating statistic.

Also on the second topic, ideally the choice comparison regions ought to have some clearly stated criteria: regions with similar area, GDP, population, population density or Covid policy. Other countries have much larger or smaller populations, so comparing the USA with the EU or ASEAN might be more appropriate. I can't offer any good concrete suggestions for that though.

In conclusion, I think Traut is right about a comparison being useful and that Dan Polansky is right about the better statistic is excess deaths over average deaths, presumably per 100,000 people.

68.96.208.77 (talk) 17:32, 24 July 2020 (UTC)Constructive Feedback

Semi-protected edit request on 29 July 2020

This sentence makes no sense.

Scott Gottlieb, former commissioner of the FDA, when a vaccine is ready for testing, about 25,000 people, in different groups, would be given the vaccine, two weeks apart, until 100,000 people have been inoculated over about six weeks.

What does Gottlieb have to do with it? He's making the sentence ungrammatical without making it make sense. Please leave him out and change the sentence to this.

When a vaccine is ready for testing, about 25,000 people, in different groups, will be given the vaccine, two weeks apart, until 100,000 people have been inoculated over about six weeks.

2601:5C6:8081:35C0:41B9:4147:7563:3EFE (talk) 00:38, 29 July 2020 (UTC)

The sentence does not have proper grammar and most certainly needs to be changed, but the sentence and its supposition appears to be attributed to Gottlieb. My proposed change would be: Gottlieb, former commissioner of the FDA, says that when a vaccine is ready for testing, about 25,000 people in different groups would be given the vaccine two weeks apart, until 100,000 people have been inoculated roughly for over six weeks.Tenryuu 🐲 ( 💬 • 📝 ) 00:53, 29 July 2020 (UTC)
@Tenryuu: I've taken the entire sentence out. The source is an OpEd written by Gottlieb, and in context he is proposing an stepped-wedge trial. The numbers are completely arbitrary and as far as I can tell, Gottlieb has no plan to actually implement it (he can't anyways as he left the FDA in 2019). The paragraph makes more sense without it as now it focuses on what is actually happening instead of hypotheticals.  Ganbaruby! (Say hi!) 12:33, 29 July 2020 (UTC)
Ganbaruby, thanks for doing that. The article appears to be locked behind a paywall so I didn't have any context beyond that. —Tenryuu 🐲 ( 💬 • 📝 ) 15:01, 29 July 2020 (UTC)

Number of US cases by date

Could we get these two charts sorted by highest/latest value? Right now they're sorted by... I don't even know what.

I'm happy to swap things around, but I don't want to step on toes belonging to any current maintainers. pauli133 (talk) 15:55, 23 July 2020 (UTC)

Do you mean ... sorting the legend? It would be ideal if the legend order matched the ranking on the most recent date - closest to the legend. Notably, about 1 in 10 men cannot distinguish all 12 of those colors, so this change would make the chart more accessible. However, if this change made the assignment of colors to states vary from day to day, that would be confusing for frequent readers. 67.169.166.36 (talk) 11:30, 25 July 2020 (UTC)

I went ahead and implemented, so you can see the result. The source is sorted alphabetically, and the y values (the legend) are sorted in order of cases, to match the endpoint on the graph. Should be easier to edit AND to read now. pauli133 (talk) 17:40, 29 July 2020 (UTC)

Looks great! 67.169.166.36 (talk) 00:19, 30 July 2020 (UTC)

Comparison to deaths in wars

In diff, I removed a comparison to deaths in wars waged outside of the U.S. This is not a meaningful comparison or context for deaths caused by an epidemic; an epidemic impacts domestic population, unlike a war waged outside the U.S. Therefore, a severe epidemic can easily cause more U.S. deaths than such a war. Furthermore, doing the comparison without adjusting the numbers for population growth is a further source of misleading effect. NYT was indicated as a source of that comparison; NYT is no reliable source on science. --Dan Polansky (talk) 19:40, 30 July 2020 (UTC)

As for diff saying "restore stats that put death numbers into context" in the edit summary, there is already some good context in the article, in comparing the covid pandemic with other significant pandemics, and that comparison is adjusted for population growth. I mean the following: "For comparison, the CDC estimated deaths in the U.S. from the 1918 Spanish Flu, the 1957–1958 influenza pandemic, and the 1968 Hong Kong flu were 675,000, 116,000 and 100,000 respectively. Adjusted for growth in population, these would be per-capita equivalents of 2,147,000, 218,000, and 164,000, respectively, in 2020." That seems to be a comparison done right. Other meaningful context is provided by all-cause death charts such as those available from CDC[20] and mortality.org via shinyapps.io[21]; these are not in the article. --Dan Polansky (talk) 08:20, 31 July 2020 (UTC)

No. vs Number

This is a minor thing but it shows the poor editing skills rife throughout WP.

There is no reason to use No. instead of Number on the charts in the statistics section.

 Not done: it's not clear what changes you want to be made. Please mention the specific changes in a "change X to Y" format and provide a reliable source if appropriate. JTP (talkcontribs) 22:30, 31 July 2020 (UTC) Apologies, wrong template. JTP (talkcontribs) 06:04, 1 August 2020 (UTC)

It is obvious.

 Not done for now: For consistency, it matches the abbreviation used in the axis labels. It isn't anything major, so if someone else objects to this, feel free to change it. JTP (talkcontribs) 06:06, 1 August 2020 (UTC)

Comparison with other countries from Statnews

In diff, I dropped again section "Comparison with other countries". I first did so in diff where I said "source to statnews.com and using weak language, and the article is in the category "First Opinion" and is not peer-reviewed science". The section was sourced from this:

I posted more detail in Talk:COVID-19 pandemic in the United States/Archive 13#Statnews article on 21 July 2020. There, I explained why the article is Wikipedia-unworthy as a source.

I ask anyone planning to restore the section to respond to the concerns I raised rather than restoring it without any justification. --Dan Polansky (talk) 07:39, 1 August 2020 (UTC)

Semi-protected edit request on 3 August 2020

Please change

Nancy Messonnier, a director at the CDC, explained that with no vaccine or treatment available, Americans must be prepared to take other precautions.

to

Nancy Messonnier, a director at the CDC, explained that with no vaccine or treatment available, Americans would need to prepare to take other precautions.

"Must" fits the present tense best, and it doesn't quite sound right for something that's months in the past. 64.203.187.101 (talk) 14:59, 3 August 2020 (UTC)

 Partly done. Used "should" instead.  Ganbaruby! (Say hi!) 13:02, 4 August 2020 (UTC)

Exponential fit

For July an exponential fit to the COVID-19 [laboratory-confirmed] cases in the United States table's mortality figures in the article is very good (deaths ≃ 120757.5035 × exp(0.005817882889 × day) [with day 1 = 8 July]; maximum discrepancy 421 out of 125,000 for period 8-22 July).

Extrapolating that fit (straightforward mathematics, absolutely no opening for anything subjective) gives 241,317 deaths for 2 Nov 20. Pol098 (talk) 16:14, 23 July 2020 (UTC)

Why would you want to do exponential fit when there is no current exponential growth of total deaths? I don't understand. Nor is there exponential growth of total cases or current hospitalizations. And what would be the underlying epidemiological model, unlimited exponential growth? --Dan Polansky (talk) 18:01, 23 July 2020 (UTC)
An exponential of form y = a × exp(b × x) gives a very good fit (curve-fitting gives deaths≃120757.5035 × exp(0.005817882889 × day) [day 1 = 8 July]; it matches the 15 data points with a maximum discrepancy of 241 (out of about 125,000). In terms of a graph: if number of deaths is plotted against date, with deaths on a logarithmic scale (intervals 1-10, 10-100, 100-1000, etc. equally spaced), the line is almost straight for the recent past, and can be well-fitted by the exponential form. There is no detailed modelling, underlying epidemiological or other, or indeed assumption involved, it is simple mathematical extrapolation.

While July has been quite closely exponential, the figures may well drop—or rise—beyond what simple extrapolation says. The extrapolated figures for 23 July and days following are 132,538; 133,311; 134,089; 134,872; 135,659; 136,450. Compare them with the numbers as they are added to the article. I don't call this a "prediction", it's just extending the curve as it was in the past 15 days.

For comparison I did this calculation on 16 July, with dates from 1-15 July; the extrapolated figure for 22 July was 126,196. The actual figure posted in the article today for 22 July was 126,511. There's absolutely no personal interpretation here; anyone can replicate this with the figures from the article today and on 16 July. HTH, Pol098 (talk) 18:31, 23 July 2020 (UTC)
I'm not 100% sure what you're trying to do, but: just because anyone can replicate it, does not mean that it is automatically our place to do so. If a reliable source makes a projection, we can quote it. pauli133 (talk) 21:23, 23 July 2020 (UTC)
Exponential extrapolation done without critical attitude already produced enough fiasco, did not it? Let's not do more of it. --Dan Polansky (talk) 08:35, 25 July 2020 (UTC)
"a projection" I was very clear that this is purely a mathematical exercise, not a prediction or projection. In fact, if I had to bet on it, I'd expect precautionary measures to kick in and make the actual figure months hence somewhat lower than the extrapolation (my point is simply that growth 1-22 July was demonstrably exponential). There are sources that talk about the growth and its exponential nature; my comment was to provide numbers to help anyone working with sources that provide words without numbers. My extrapolation has already diverged somewhat for 23-24 July; the actual figures are significantly higher than the extrapolated figures I gave. HTH, Pol098 (talk) 11:43, 25 July 2020 (UTC)
The above seems pretty bizarre to me; the claim that a certain extrapolation stating "241,317 deaths for 2 Nov 20" is a not a projection seems hard to understand; the claim that it is not a prediction seems better. But what really caught my attention was the claim that the fit was "straightforward mathematics, absolutely no opening for anything subjective": sure, not being subjective is a property of all deterministic processes and algorithms. Some astrologers use deterministic transformations of sky observations to arrive at certain predictions, but that does not make these predictions meaningful. There is a whole class of algorithms using deterministic transformations to produce something that has a certain properties (not all of them) of random outputs; these are pseudorandom generators. A particular deterministic procedure or method must be meaningful or be meaningfully applied; the procedure's being deterministic alone does not make its application meaningful. And the application of exponential fitting for something that would not grow exponentially for over three months (until the quoted 2 Nov 20) even in the case of zero interventions cannot possibly be meaningful. --Dan Polansky (talk) 16:28, 28 July 2020 (UTC)
OK, whether or not we call it a projection is semantics; "not a prediction" is certainly better, so please thus interpret my comment. The process I used is a simple least-squares fit to an exponential determined by two parameters (using an available tool); the visible straightness of the lines in the log graph in the article, and the small differences between the fitted function and reported numbers suggests that fitting an exponential function is suitable. I did this to try to understand what seems to be happening, and what might happen if it continued in the same way (hopefully not). The exponential nature of the curve in the article might be worth mentioning if a source has published it. Thanks for the comments. Best wishes, Pol098 (talk) 21:22, 29 July 2020 (UTC)
Given the daily new hospitalizations for U.S. are beyond a peak[22], it is unlikely that the cumulative deaths are going to grow exponentially for months. --Dan Polansky (talk) 09:02, 1 August 2020 (UTC)
I was very clear that this is purely a mathematical exercise, not a prediction or projection. In fact, if I had to bet on it, I'd expect precautionary measures to kick in and make the actual figure months hence somewhat lower than the extrapolation. (I said this earlier in the thread.) Pol098 (talk) 20:28, 1 August 2020 (UTC)
A mathematical operation on real-world data indicating a possible value of a future data point is not "purely a mathematical exercise", by definition. --Dan Polansky (talk) 17:29, 3 August 2020 (UTC)

Contact tracing, typo

That paragraph cites "contract tracing", which sounds like a job for the Auditor General..... 70.50.50.177 (talk) 14:26, 9 August 2020 (UTC)

 Done. Thank you (and also to 71.206.133.192) for pointing this out. Aoi (青い) (talk) 00:17, 14 August 2020 (UTC)

Dropping daily recoveries 2

I went ahead and dropped the daily recoveries chart. No sources were provided by the updater despite two requests. The utility and reliability of this type of chart is unclear. In the edit summary, I mentioned covidtracker.com but meant covidtracking.com. See also Talk:COVID-19 pandemic in the United States/Archive 13#Dropping daily recoveries. --Dan Polansky (talk) 10:23, 5 August 2020 (UTC)

Double-sided moving average

In diff, DASL51984 changed the moving average to double-sided one. I am not sure that is a good thing. I for one am used to the normal moving average, which is easy and plain to calculate but admittedly introduces a delay. On the other hand, anyone glancing at a chart showing both the raw data and the average can see that there is a delay so the reader is unlikely to be mislead.

Are there any sources on covid or other phenomena using double-sided moving average? Worldometers used double-sided average shortly and then switched to the plain single-sided one; covidtracking.com uses plain single-sided one.

And it now says the average is "weighted"; weighted how? What are the weights? How does one actually calculate the weighted double-sided average? Other people need to be able to update this, and the computing-literate reader needs to be able to reproduce the average from the raw data. --Dan Polansky (talk) 08:24, 4 August 2020 (UTC)

A: I support the changes. When I added the simple moving average, I had plans to eventually switch to something properly centered instead of offset. It looks like @DASL51984 has beaten me to it, and gone a little farther, with the weighting. Great! The charts are better now.
B: I 100% agree that this needs a bit more transparency and maintainability; if DASL can comment the equation(s) used into the article, I think that need will be satisfied. pauli133 (talk) 14:07, 4 August 2020 (UTC)
The above expressed support but with zero rationale. I have no answers to the key questions I asked above. And the daily case chart's average has the rightmost part look suspect: it turns up a little, which I would think it should not; and if it is centered, the final 3 points should probably show no average at all. --Dan Polansky (talk) 16:19, 4 August 2020 (UTC)
Meanwhile, DASL51984 updated the moving average, which is now suspiciously excessively downward sloping at the right end, for both cases and deaths. This is all suspect; let's return to plain 7-day moving average, no double-siding. --Dan Polansky (talk) 19:15, 4 August 2020 (UTC)
We can solve the wagging tail issue by not including data for days that don't yet have a full seven day window underpinning them, that's simple enough. We do still need an explanation of the math. pauli133 (talk) 19:43, 4 August 2020 (UTC)
Let's return to plain 7-day moving average, the one used by multiple sites plotting covid data, and the problem is solved. What source explains that the double-sided moving average is appropriate for what is a time series? What makes you think you know what you're doing with the double-sided moving average? --Dan Polansky (talk) 19:51, 4 August 2020 (UTC)
This is a collaborative project. There is no reason for you to be aggressive or demeaning. You aren't going to browbeat me into doing things your way, so stop it.
As it happens, I was already working on unweighted values, because they are easier to maintain and don't differ noticeably. I still believe that the purpose of the graph is to aid the reader in visualizing the data, and the purpose of an average line is to aid the reader in making sense of the general trends in the data - and that having that offset by three days isn't beneficial. Neither is more or less accurate, but one is more convenient for the reader, hence my preference. pauli133 (talk) 20:09, 4 August 2020 (UTC)
Asking for sources is the normal Wikipedia thing; I indicated sources that are doing the plain average, and the above editor gave us no sources doing otherwise so far. If the above editor does not have sources but has experience with data processing in this manner in professional setting, that would be another thing, but such has not been indicated yet. I fear that removing the offset is misleading, but do not have a source for that, and I would have to figure out a compelling analysis. Opinions and preferences not backed up by anything should be taken with a dose of skepticism. --Dan Polansky (talk) 20:19, 4 August 2020 (UTC)
If represents a particular data point, and represents the corresponding data after smoothing, then my original formula went like this: . DASL51984 (Speak to me!) 12:05, 6 August 2020 (UTC)
Thank you. Do we have a source that uses a similar formula for what is a time series, not necessarily for covid data? And is the benefit of using this formula, if any, worth the more complicated explanation to the reader? (The above is a 15-day average, counting all days being averaged. To prevent side effects, we would need to omit the average from last 7 days.) --Dan Polansky (talk) 13:04, 6 August 2020 (UTC)
The reason I used this is that a simple 7-day moving average has a very odd impulse response. When you hit a spike, it goes up, stays there, and abruptly goes back down. With linear weighting, you don't have the delay associated with a one-sided average, and any "spikes" get mapped to triangular-looking peaks. I should also mention that I copy the first and last data points as "dummy values" to try to fill in missing values at the beginning and end. I'm still trying to find a source using a weighted average. DASL51984 (Speak to me!) 14:43, 6 August 2020 (UTC)
The method 'I copy the first and last data points as "dummy values" to try to fill in missing values at the beginning and end' seems all too likely to create misleading final trend part; do you have a source describing this method and its possible limitations? --Dan Polansky (talk) 14:54, 6 August 2020 (UTC)
A limitation of the above method: when the final unfiltered point is one of those periodically ocurring low points, you will copy the low value to missing 7 future points, and that will artificially drag the average down. --Dan Polansky (talk) 15:00, 6 August 2020 (UTC)
The artificial dragging is exactly the problem I was having. DASL51984 (Speak to me!) 15:48, 6 August 2020 (UTC)
The simple 7-day moving average does not artificially drag any value down; it does not involve filling first and last data points at all. And we need sources; or are you inventing a new method and applying it to covid time series? --Dan Polansky (talk) 15:51, 6 August 2020 (UTC)
I'm really not sure if anyone has even thought of this before. I may be inventing something totally new.
And by the way, I tried making the value that's copied at the end equal to the average of the 7 most recent values, instead of just the most recent value. I tried this out and it seems to have greatly reduced the dragging. DASL51984 (Speak to me!) 16:03, 6 August 2020 (UTC)
Standard methods have been studied, applied and are understood. A novel method not only probably violates some Wikipedia policy, but also runs the risk of introducing problems that neither you nor I will necessarily notice. (I did happen to notice a problem, above.) Applying a method for which we have no source seems like asking for trouble. --Dan Polansky (talk) 16:13, 6 August 2020 (UTC)

Calculating moving average using awk

You can calculate the 7-day moving average using awk on Windows:

echo 1, 0, 4, 5, 18, 15, 28, 26, 64, 77, 101 | awk -F, -vn=7 "{for(i=1;i<=NF; i++) {s+=i>n?$i-$(i-n):$i; if(i>=n){printf \"%.0f, \", s/n}else{printf \", \"}}}"

You can put the result into clipboard:

echo 1, 0, 4, 5, 18, 15, 28, 26, 64, 77, 101 | awk -F, -vn=7 "{for(i=1;i<=NF; i++) {s+=i>n?$i-$(i-n):$i; if(i>=n){printf \"%.0f, \", s/n}else{printf \", \"}}}" | clip

You can do the calculation on Linux:

echo 1, 0, 4, 5, 18, 15, 28, 26, 64, 77, 101 | awk -F, -vn=7 '{for(i=1;i<=NF; i++) {s+=i>n?$i-$(i-n):$i; if(i>=n){printf "%.0f, ", s/n}else{printf ", "}}}'

If you are on Linux or a modern Mac, you already have awk. For Windows, you can install awk from ezwinports or GnuWin32 project. --Dan Polansky (talk) 08:16, 8 August 2020 (UTC)

Plotting all-cause deaths

The following Python script ("plotUsData.py") outputs wiki-syntax for x and y data for plotting data for weekly all-cause deaths, requiring the user to pass the script Weekly_excess_deaths_Full_Data_data.csv downloaded from CDC[23]:

import sys, csv, datetime

fileName = sys.argv[1] # e.g. "Weekly_excess_deaths_Full_Data_data.csv"
data = []
for line in csv.reader(open(fileName)):
  if "Observed Number" not in line[0]:
    date =  datetime.datetime.strptime(line[2], "%B %d, %Y")
    deaths = line[0].replace(",", "")
    data.append( (date, deaths) )

data.sort(key=lambda x: x[0])

datesOut = ", ".join([k.strftime("%Y-%m-%d") for k, v in data])
deathsOut = ", ".join([v for k, v in data])
sys.stdout.write("|x =" + datesOut + "\n")
sys.stdout.write("|y =" + deathsOut + "\n")

The above should work not only for the whole U.S but also for individual states, provided the proper .csv is downloaded; tested on Arizona. --Dan Polansky (talk) 14:49, 8 August 2020 (UTC)

To obtain Weekly_excess_deaths_Full_Data_data.csv:

  • 1) Visit CDC[24].
  • 2) Select the chosen state in "Select a jurisdiction" field.
  • 3) In the chart below the field, click on the icon for Download.
  • 4) In the popup window, choose "Data" as the format.
  • 5) In the larger popup windows, select "Full Data" tab.
  • 6) Click on "Download all rows as a text file" and save the csv.

--Dan Polansky (talk) 15:53, 8 August 2020 (UTC)

Time to reach one million, two million, etc.

Is there a place we can easily see how long it took to reach one million, two million, three million cases? I'm just now looking at a month-old newspaper article that has this information and predicts 4 million by July 22.— Vchimpanzee • talk • contributions • 18:46, 8 August 2020 (UTC)

August 8: about 5 mil. {{3125A|talk}} 18:57, 8 August 2020 (UTC)
The table at the top of the "Timeline" section allows you to see the day-by-day numbers for the US. The million dates were 4/28, 6/12, 7/8, 7/24, and 5 million will be in a few days. 68.7.103.137 (talk) 22:58, 8 August 2020 (UTC)
There is no convenient place to see when each milestone was reached. Or is there?— Vchimpanzee • talk • contributions • 21:21, 9 August 2020 (UTC)
The "milestones" mean nothing; they relate to absolute numbers rather than ratios. But if you cannot resist the temptation, you can consult OWID[25], and see that 4 million confirmed cases in the U.S. were reached on July 24. Had U.S. slowed down the testing much like other countries did, the milestone would be reached much later, showing again the dubious utility of that "milestone". --Dan Polansky (talk) 10:44, 10 August 2020 (UTC)

Plotting all-cause deaths from mortality.org

The following Python script ("plotHmd.py") outputs wiki-syntax for x and y data for plotting data for weekly all-cause deaths for ages 0-14y, from mortality.org AKA Human Mortality Database (HMD):

import sys, csv, datetime

fileName = sys.argv[1] # e.g. stmf.csv
countryCode = sys.argv[2] # e.g. USA
figureFieldName = sys.argv[3] # e.g. D0_14 (deaths for 0-14y), DTotal (total deaths)

data = []
file1 = open(fileName)
file1.readline() # Two intro lines without data
file1.readline()
for line in csv.DictReader(file1):
  if line and line["CountryCode"] == countryCode and line["Sex"] == "b":
    deaths = int(float(line[figureFieldName]))
    date = line["Year"] + " " + line["Week"] + " 0"
    date = datetime.datetime.strptime(date, "%Y %U %w")
    data.append( (date, deaths) )

data.pop() # Drop last two weeks for too big registration delay effect
data.pop()

datesOut = ", ".join([k.strftime("%Y-%m-%d") for k, v in data])
deathsOut = ", ".join([str(v) for k, v in data])
sys.stdout.write("|x = " + datesOut + "\n")
sys.stdout.write("|y = " + deathsOut + "\n")

Usage:

plotHmd.py stmf.csv USA D0_14

To obtain stmf.csv, go to mpidr.shinyapps.io/stmortality and at the bottom of the left pane click on the icon to the right of "CSV".

You can use other country code to plot for another country/region: AUT (Austria), BEL (Belgium), BGR (Bulgaria), CHE (Switzerland), CZE (Czechia), DEUTNP (Germany), DNK (Denmark), ESP (Spain), EST (Estonia), FIN (Finland), FRATNP (France), GBRTENW (England and Wales), GBR_SCO (Scotland), HUN (Hungary), ISL (Iceland), ISR (Israel), ITA (Italy), LTU (Lithuania), LUX (Luxembourg), LVA (Latvia), NLD (Netherlands), NOR (Norway), POL (Poland), PRT (Portugal), SVK (Slovakia), SVN (Slovenia), SWE (Sweden), USA (U.S.).

You can use other field code: D0_14 (deaths for 0-14y), D15_64 (deaths for 15-64y), D65_74 (deaths for 65-74y), D75_84 (deaths for 75-84y), D85p (deaths for 85+y), DTotal (deaths total).

The script drops last two data points as obviously very badly affected by registration delay. --Dan Polansky (talk) 15:02, 9 August 2020 (UTC)

Updated. --Dan Polansky (talk) 13:24, 10 August 2020 (UTC)

Reminder: the talk page is not a forum for general discussion

It's currently difficult to sift through this talk page for actionable suggestions re: adjustments or additions to the article. I'm not sure how else to address the issue, since I'm only a casual editor. 2601:1C0:C802:3A00:159D:C91F:EE48:2147 (talk) 06:38, 18 August 2020 (UTC)

Plotting annual all-cause deaths from mortality.org

As a follow up on #Plotting all-cause deaths from mortality.org, the following Python script ("plotHmdPerYear.py") outputs wiki-syntax for plotting weeks 1-x per year, where x is currently 27 for U.S.; going only for first block of weeks makes the numbers reasonably comparable between years.

import sys, csv, datetime

fileName = sys.argv[1] # e.g. stmf.csv
countryCode = sys.argv[2] # e.g. USA
figureFieldName = sys.argv[3] # e.g. D0_14 (deaths for 0-14y), DTotal (total deaths)

data = []
file1 = open(fileName)
file1.readline() # Two intro lines without data
file1.readline()
for line in csv.DictReader(file1):
  if line and line["CountryCode"] == countryCode and line["Sex"] == "b":
    deaths = int(float(line[figureFieldName]))
    data.append( (int(line["Year"]), int(line["Week"]), deaths) ) 

data.pop() # Drop last two weeks for too big registration delay effect
data.pop()

year2020Weeks = [week for year, week, deaths in data if year == 2020]
maxWeek = year2020Weeks[-1]
years = sorted(list({year for year, week, deaths in data}))

deathsUpToMaxWeek = []
for year in years:
  deathsUpToMaxWeek1 = 0
  for year1, week, deaths in data:
    if year1 == year and week <= maxWeek:
      deathsUpToMaxWeek1 += deaths
  deathsUpToMaxWeek.append(deathsUpToMaxWeek1)
       
yearsOut = ", ".join([str(year) for year in years])
deathsOut = ", ".join([str(deaths) for deaths in deathsUpToMaxWeek])
sys.stdout.write("Last week in 2020: %i\n" % maxWeek)
sys.stdout.write("|x = " + yearsOut + "\n")
sys.stdout.write("|y = " + deathsOut + "\n")

The use is similar to what is described in #Plotting all-cause deaths from mortality.org, especially parameter values. --Dan Polansky (talk) 12:28, 11 August 2020 (UTC)

Charts 2

The charts for states with > 100,000 cases (now 17!) are somewhat crowded. Would it be useful to split them into 100,000 - 250,000 and > 250,000 cases? --Qumranhöhle (talk) 08:27, 11 August 2020 (UTC)

They are crowded, but whatever the usefulness of this type of chart, it is there even with the crowding. As I said elsewhere: "For each state, I can see 1) its current approximate case value, 2) the recent rate of change (slope of the curve), 3) the relation of the two pieces of information to other states. Sometimes the lines meet in a cluster but even then I can see what is going on; to wit, e.g. Massachusetts cumulative cases are now at about 120 000 and are growing slowly, whereas cases for other states in the same cluster grow faster, which include Tennessee, Louisiana, North Carolina and more."
To get better load balancing between the two charts, it would suffice to move the 100,000 threshold to, say, 120,000 threshold.
There is a deeper problem: the charts would be much more informative if they were 1) new daily counts instead of cumulative counts, and 2) per 100,000 pop instead of absolute counts. --Dan Polansky (talk) 09:02, 11 August 2020 (UTC)
That's a more general problem which, maybe, should be discussed separately from this one. Indeed, certain clusters will remain, on the other hand a different scale on the y axis already allows for a better orientation (i.e., only the 100,000s). Why not both, move the threshold to 150,000 (?) and change the y axis in the > 150,000 cases chart? --Qumranhöhle (talk) 10:26, 11 August 2020 (UTC)
What change to the y-axis would you want to see? I for one am annoyed with charts that have the y-limit greater than 0 since they visually misrepresent the scale of the values. --Dan Polansky (talk) 10:43, 11 August 2020 (UTC)
As written just different steps, only (e.g.) every 100,000 instead of every 50,000, start with 0, of course. --Qumranhöhle (talk) 11:57, 11 August 2020 (UTC)
I see. I don't think it is the horizontal light gray lines of the grid that make the charts overcrowded. Nonetheless, I don't strongly object to going for 100,000 step as you propose. --Dan Polansky (talk) 12:52, 11 August 2020 (UTC)
It's worth a try. If it doesn't work, it is easy to undo it. --Qumranhöhle (talk) 14:21, 11 August 2020 (UTC)