Featured Product
This Week in Quality Digest Live
Health Care Features
Etienne Nichols
How to give yourself a little more space when things happen
Chris Bush
Penalties for noncompliance can be steep, so it’s essential to understand what’s required
Jennifer Chu
Findings point to faster way to find bacteria in food, water, and clinical samples
Smaller, less expensive, and portable MRI systems promise to expand healthcare delivery
Lindsey Walker
A CMMS provides better asset management, streamlined risk assessments, and improved emergency preparedness

More Features

Health Care News
Showcasing the latest in digital transformation for validation professionals in life sciences
An expansion of its medical-device cybersecurity solution as independent services to all health systems
Purchase combines goals and complementary capabilities
Better compliance, outbreak forecasting, and prediction of pathogens such as listeria or salmonella
Links ZEISS research and capabilities in automated, high-resolution 3D imaging and analysis
Creates one of the most comprehensive regulatory SaaS platforms for the industry
Resistant to high-pressure environments, and their 3/8-in. diameter size fits tight spaces
Easy, reliable leak testing with methylene blue
New medical product from Canon’s Video Sensing Division

More News

Donald J. Wheeler

Health Care

Some Perspective on the Covid Pandemic

What happened in July?

Published: Tuesday, July 28, 2020 - 12:03

With data that come along one number at a time, it is easy to get lost in the details. To see the big picture, it helps to use a time-series graph that will draw your eye in the direction that your mind wants to go. These simple graphs reveal how the values are changing over time and thereby place each value in context, making them more easily understood. Here we will look at some time-series that provide a global perspective on the Covid-19 pandemic.

Comparable numbers

With 329 million people, the United States is the third largest country in the world. China and India have more than four times the population of the United States, and fourth-place Indonesia has only 82 percent of the population of the United States. These size differences make direct comparisons misleading, which is why data are often normalized and expressed as rates per million population. While these rates provide an equitable basis for making comparisons, the per-capita rates are one step removed from the original values. A conversion is required to turn these rates into values we can expect to see in practice. This conversion is not complicated, but it nevertheless creates an obstacle for readers to either hurdle or stumble over.

Therefore, in the interest of clarity, the following summary will be based on the original values rather than per-capita rates. To create a rational basis for making comparisons using the original values, I have created groups of countries where each group has a combined population that is similar to that of the United States. This allows for reasonable comparisons using the original data.

The data

Here we shall use the daily numbers of confirmed cases of Covid-19, as reported by the European CDC, for each of the countries involved. These values are lower bounds for the actual number of cases of Covid-19 in these countries. However, these values, along with the number of deaths from Covid-19 each day, are the only values I have found that are available on a daily basis going back to the beginning of 2020 for all the countries around the world. For each group of countries defined below, these daily numbers of new cases will be combined into group totals.

Since some countries report these numbers only five days a week, I used a seven-day moving average to smooth out the weekly cycles and the day-to-day noise. These seven-day moving averages show the underlying trends for each country or group of countries. These averages will be plotted against the midpoint of the period represented. That is, the average for July 16 through July 22 will be plotted as the value for July 19.

The countries

The Western European group consists of Spain, Italy, France, Germany, and the United Kingdom. These five countries have a combined population of 324 million, which is 98 percent of the population of the United States.

The Pacific group consists of South Korea, Japan, the Philippines, and Australia. These four countries have a combined population of 311 million, which is 94 percent of the population of the United States. The time-series for the average daily number of new cases for these two groups are shown in figure 1.

Western Europe had a quickly rising number of cases in March 2020 that peaked at an average of 25,538 new cases per day on April 3, 2020. Since that peak the Western European averages have steadily declined to the present average of 1,860 new cases per day. However, the Pacific group never let the pandemic get out of hand. Presently about two-thirds of the new cases are coming from the Philippines, while the other three countries are maintaining very low growth rates.

Figure 1: Average daily new confirmed cases of Covid-19 for Pacific and Western European groups

India has 1,366 million people, which is 415 percent of the population of the United States. Here we will have to use a rate in order to obtain values to compare with those from the other groups. One-fourth of 1,366 million is 342 million, which is within the range of populations being considered. Therefore, the seven-day averages for India will be divided by four and the results plotted in figure 2.

The African group consists of Egypt, Nigeria, and South Africa. These three countries have a combined population of 360 million, which is 109 percent of the population of the United States.

India has shown a slow but steady spread of Covid-19. Its current average daily number of new cases is 32,522. When this value is divided by four, we get a comparison value of 8,143 new cases per day per 342 million people. The African countries had a slow start in March, but they have shown a steady climb since then to the current average of 13,263 new cases per day.

Figure 2: Average daily new confirmed cases of Covid-19 for India and the African group

The South American group consists of Brazil, Columbia, Argentina, and Peru. These four countries have a combined population of 339 million, which is 103 percent of the population of the United States. The seven-day averages of new confirmed cases of Covid-19 for both the United States and the South American group are shown in figure 3.

Figure 3: Average daily new confirmed cases of Covid-19 for South America and the United States

The South American countries started more slowly than Europe and the United States, but their average number of new cases per day started to climb more rapidly after April 20, 2020. Between the last week of May and the last week of June, they even outstripped the United States in the average number of new cases per day. This group’s current average number of new cases per day is 49,103.

The United States went from an average of 123 new cases per day on March 8, 2020, to 31,942 new cases per day on April 8, 2020. After spending four weeks averaging about 29,000 cases per day, the United States dropped down to a plateau of about 22,000 cases per day for the next six weeks. Then, as the United States started to “open up,” we see a new wave of infection beginning in the middle of June that has taken the United States up to the current average of 67,212 new cases per day.


As of July 22, 2020, the world was averaging 227,336 new cases per day, which is about 1.6 million new cases per week. The five Western European countries, with 4.3 percent of the world’s population, are contributing 0.8 percent of these new cases.

The four Pacific countries, with 4.1 percent of the world’s population, are contributing 1.2 percent of these new cases.

India, with 17.9 percent of the World’s population, is contributing 14.3 percent of these new cases.

The three African countries, with 4.7 percent of the world’s population, are contributing 5.8 percent of these new cases.

The four South American countries, with 4.4 percent of the world’s population, are contributing 21.6 percent of these new cases.

And the United States, with 4.3 percent of the world’s population, is contributing 29.6 percent of these new cases. The U.S. infection rate of 67,212 new cases per day is 36 percent greater than the average infection rate in Brazil, Columbia, Argentina, and Peru (49,103 new cases per day). It is more than five times greater than the combined average infection rate average in Egypt, Nigeria, and South Africa (13,263 new cases per day).

Moreover, for every new case in India, there are two new cases in the United States (32,572 vs. 67,212). Since India has four times the population, this means that the Covid-19 infection rate in the United States is eight times greater than the Covid-19 infection rate in India.

Every day the odds are going up that the next person you come in contact with will be spreading Covid-19. So wash your hands frequently. Wear a mask that covers both your nose and mouth whenever you leave your house. And keep your distance from others. More than 600,000 people have died from Covid-19, and more than 140,000 of these have died in the United States. Don’t be one of tomorrow’s 67,000 new cases because you made bad choices today.


About The Author

Donald J. Wheeler’s picture

Donald J. Wheeler

Dr. Wheeler is a fellow of both the American Statistical Association and the American Society for Quality who has taught more than 1,000 seminars in 17 countries on six continents. He welcomes your questions; you can contact him at djwheeler@spcpress.com.



Thanks Don for Your Count Data Presentation

You know I am o fan of PPM (Parts Per Million). You have supported me saying it is "a robust method". Even so, people seldom understand PPM, especially not being used to that calulation. I am convinced that your last count data presentation will reach outside the statistical community to political people familiar with "mere counting data with graphs". I do not find any problems with the accuracy of confirmed cases, as we compare integers (first digit) and orders of magnitude (factors of 10) between countrys at different phases of the progressing pandemic. I have found some simple metrics to use: If average human life span is 80 years, the flat death rate in population is 34 PPM. We now have decreased to 1 PPM Covid death rate in Sweden, average 10 per day on 10 million people, that is a pretty small proportion of 1/34 covid deaths, a 3% share of total 340 avg "normal deaths" per day. If we can keep that level we are very happy, but it still needs the same level of social distancing and hand washing. People here do not wear masks due to the public health recommendation that they do not help. I am one of very few when I have gone public twice the last four months. Officials here do not understand the calculations of mutual mask reduction of infection rate when moving around in public. As US claims a public reduction factor of 5 for general masks, I have seen factor 45 reduction rates as both transmitter and receiver wear masks. Me, as an electronics and acoustics engineer, is familiar with various filter attenuators. I hope someone proves that the reduction rate with masks is at least a factor of 10. What is the infection rate attenuation of nurse's professional PPEs: Face Shields + Face Masks? Empirical data should be available by now.

Infections versus confirmed cases

As you state upfront, the numbers you are discussing are confirmed cases, not infections. inside the article, however, you refer to "infection rates" for the daily number of new confirmed cases. There are two problems here:

  1. In the absence of random sample testing of the population at large, the infection rate is unknown. 
  2. The number of confirmed cases in a country depends on its testing program, which makes comparisons between countries dubious. 

In addition, the quality of census data varies greatly between countries. Even in the US, it is subject to politically motivated undercounts, and, even with the best intentions, many countries lack the resources to accurately count their populations. In many countries, both the number of confirmed cases and the population have such broad margins of error as to make the "confirmed cases per 100,000 inhabitants" numbers useless. 

On the data Used

Every day I look at the European CDC database and extract the values for 30 countries around the world.

The population data they use comes from the World Bank, and is sufficiently accurate for the way it is used here.

This means that the counts of confirmed cases create lower bounds for the infection rates.

Think about this very carefully! The number of tests given in the different groups are not sufficiently differnet to explain the 37-fold difference found between Europe and the U.S.

Data quality

I believe your conclusions regarding the US and the Western European countries you list. I would trust Japanese data also. For the rest of the world, it's a different story. 

On the European CDC website, you can download the list of data sources for all the countries they cover. Most are health ministries. For population data, the World Bank does not have agents going around the world counting people. They may make adjustments but the raw data is from official censuses. 

For government statistics to be trustworthy, a country has to be democratic and rich enough. In dictatorships, the statistics say whatever the government wants to put out; in poor democracies, collecting and processing good data is unaffordable.

The data

How can the gap be so big between the US and the rest of the world, especially India, when grouped by coparable population numbers?  Could it be that instead of showing that the US is more infectious, the data is really showing that the US is doing more testing than the other countries or is more transparent in the counts?  

Testing the “testing hypothesis”

Let's examine the testing hypothesis as an explanation of the magnitude of the discrepancy between Covid 19 data from the US versus the rest of the world.

1) Are we really testing that many more people than the rest of the world? Admittedly, expressed as rates per million people, we test a lot- but no where enough more to account for the magnitude of this difference.

2) If testing were a reasonable explanation for the number of cases we report, it should show up in our positivity rate. It doesn'.

3) I testing accounted for most of the difference between our case reports vs other countries, then changes in our testing rates should co-vary with the magnitude of the differences between the cases reported by US vs other countries. It doesn't. We continue to pull away from the pack regardless of what testing policy is in place at any given point in time.

4) I will refrain from commenting on our "transparency"as a possible explanation for our supremacy within the international community by noting that this administrations recent attempt to divert data from going directly to the CDC was reversed only after it became publicly known and caused a severe backlash.

Operational definitions - Measurement

What impact to you feel the financial "incentive" for labeling a death a COVID-19 death or the reliability of the testing (a lot of false positives being reported) have on this analysis?

July up-date

As a contributor to the first 3 columns in this series one might expect that I would no longer be surprised by the tricks Dr Wheeler pulls out of the Covid 19 bag.

However that is not the case. Each column manages to provide a new and informative perspective on this unfolding catastrophe. In spite of these continually changing perspectives, they all illustrate the same underlying message that no data have meaning apart from the context in which they have been gathered and within which they will be applied.

This latest column uses an interesting format to supply that context, while at the same time educating the reader about how to appreciate this new perspective.The coronavirus numbers themselves are so unrelentingly ominous that changing perspectives on them is an important way to keep them fresh and meaningful. Grouping data into geographically and culturally meaningful clusters that level the playing field when making comparisons across clusters avoids the temptation to massage the data to make a particular point.

These data do indeed represent the " voice of the Covid 19 process" as it unfolds as the global pandemic it has become. Viewing its growth rates within various regions of the world illustrates just how futile and irresponsible our efforts have been to contain its growth within the US. We are now in the unenviable position of having become a parriah within the international community, ironically encapsulated by the wall that a scientific analysis of Covid 19 data has imposed on us.

I can't say I look forward to what August's column will disclose since it is by now painfully obvious that we have chosen the once unthinkable option of letting the pandemic run its course. However, I do know that Don will once again find a way of presenting those data in a way that is informative, intellectua honest, and readily accessible.

Excellent overview of Dr. Wheeler's latest analysis

Thank you for your contributions earlier and also explaining so carefully and clearly what we have found here in Dr. Wheeler's paper.  You are right, we have built a wall, though maybe not the one some envisioned.  Myself and my family have done our part but when we do have to go out, it seems half of the people around us do not care.  Fine to not care about yourself, but your family and friends?  Ridiculous.  The results are brutally obvious.  To all that take issue with this analysis, or the previous ones, write your papers and we will discuss them.  I find them to be the best analyses on the web.  



You do not address the number of tests being conducted and considering that most cases are asymptomatic the 29.6% of new cases are misleading. If you can in the numbers provided , can you address this?

Excellent Analysis

Figure 3 of your article tells the whole story. Using the original data along with rational grouping makes it all more powerful.


Thanks for your statistical perspective

I enjoy your articles because numbers don't lie.  I guess they can be manipulated to lie by some, but that's usually political statistics.  :)

I'm not a numbers person but when I see your numbers, I'm truly astonished that as a country we are not doing something to get these numbers down.  I don't know what the answers are, but part is individual responsibility to do the basics - wear a mask, wash your hands, socially distance.  We are free in this country to do as we please, BUT ONLY if it doesn't harm others.  Not taking personal responsbility can harm others in this pandemic.

Keep up your articles - hopefully we'll be able to see the US stats come down sooner rather than later.

Mary Chisholm

MicroRidge Systems

Grouping strategy

Surely a better grouping would be groups of countries conducting similar numbers of tests.  That would still be potentially misleading as what groups are being tested is important (those w/symptoms vs. general population for example.)  But it would likely be more meaningful than similar populations.

If wishes were horses then

If wishes were horses then beggars would ride.

It does no good to wish for data that are not available.

The 50 states are certainly as heterogeneous as the countries grouped together here.

Dr. Wheeler's Analysis is Excellent

1. Dr. Wheeler's Analysis is excellent. It does no one any good to complain about accurate or reliable data- there is no such thing-except in a theoretical world.

2. I would suggest that Dr. Wheeler should analyze the following data as well: COVD-19 Deaths per 100,000 population; COVID-19 deaths per 100,000 COVID-19 patients; and these rates before and after "lockdowns". The groups could be the same countries as grouped in his present article.

On COVID-19 Data

I have been tracking COVID-19 cases and deaths in the US down to the county level for some Government dashboards, as part of a contract I'm currently on. We use the data from Johns Hopkins. I don't track data from other countries. I can tell you that there are often anomalies in the US data; some of it is structural (there are a number of states that don't turn in deaths numbers on Sundays -- and, apparently, some that don't turn them in on Mondays). Some of the anomalies are not easily explained. 

A couple of questions were raised in this discussion. One was about the "testing" question. Testing is one pretty good way to measure, and if we knew more about the reliability of the tests, it would allow for quantifying some of the uncertainty. Most of the databases I've seen have fairly extensive discussions on how the data were collected. For COVID, the numbers generally reflect diagnoses. I suppose you could argue that a doctor's diagnosis is not as definitive as a lab test, but the virus doesn't care whether your doctor has access to lab tests or not - it's going to infect you either way, and doctors have to diagnose their patients' problems whether they can get the tests or not. You can also argue about whether the tests are accurate, but then we have to deal with the conditional probability arguments around false positives and false negatives, and it would probably turn out that MD diagnoses are at least as reliable as lab tests, taken as a whole. 

And then there are cases that are unreported...we can only speculate about those, so I don't. I just recognize that for any number of very good reasons (and in some cases for very bad reasons), every number I get is probably low. How low, I can't know. I just know that the numbers reported are optimistic. In years to come, we will be able to do some time series analyses on deaths, and we should be able to estimate the excessive death rate during these months/years. That should be an indication of how low the death count was. Cases we probably can't know without some sort of universal antibody testing. The numbers reported by JH are "confirmed cases." I give my client all the caveats, so they know what I'm reporting. 

Do I wish the data were better, cleaner, more comprehensive? You bet. I get two numbers each day for each county in the US: a confirmed cases count and a deaths count. I would love to know how many of those cases are active cases. That number isn't universally reported. Same with hospitalizations. That makes finding things like deaths per some number of patients a challenge. If I had 10 or 20 dedicated analysts who could pore over data from all the disparate sources, I might be able to get those counts on a reliable basis, and some of that is available at higher levels of aggregation, but I am looking at US Counties and States, and I have to have data that are reasonably comparable county-to-county. That would take a bigger crew than I have budget for. 

There is already some evidence for the effects of lockdowns - most of it comes from Asia, where they practiced lockdowns, social distancing and masking much more rigorously than in many other countries. 

Regarding Cases Reported

Thanks for this post, Dr. Wheeler, and Rip, thanks for your work and your reply.

You say that the number of unreported cases is not known, but you "just know that the numbers reported are optimistic." In other words, that you suspect that they are lowball figures.  This has been a source of speculation among conspiracy theorists: That there are incentives for health systems to incorrectly attribute a death to COVID-19 that are related to funding or reimbursements or some such.  I don't understand the details of their theory, but the upshot is that they believe there to be some structural incentive to inflate death figures, and the "mass collusion" takes the form of a structural incentive to inflate death numbers.  With your experience under your contract, looking at county data and some of the details of collection methodology, do you have any opinion on this matter?



Within county data?

County-level data sounds highly detailed when considering the entire country. Within the county you live, on the other hand, you would very much like to have finer data. Santa Clara County, CA, has 2 million inhabitants. Half the population lives in San Jose; 1/20, in Palo Alto. These are radically different environments but I have not been able to find data at the city or neighborhood level. 

And yes, as Don Wheeler says, the number of confirmed cases is a lower bound to the number of infections, but we don't know whether the number of infections is 5 or 10 times the number of confirmed cases. All it would take to find out is testing random samples of a few thousand members of the general population. As the size of this multiplier is a function of testing practices, such studies would have to be done in multiple places. As far as I know, this has yet to be done anywhere.

As a country, we have the resources to poll constantly on voter intentions and on TV viewership but we haven't been able to do it to estimate how many of us are infected. Do we not want to know?