Our COVID-19 data FAQ: Why we present data the way we do

(Screen Capture)


When we published the first version of our map tracking COVID-19 cases in Pennsylvania, state health officials had confirmed just a dozen cases of the disease, all clustered along the eastern edge of the Commonwealth. 

That was a month ago. At the time we write this, almost 20,000 Pennsylvanians scattered across all 67 counties have tested positive for COVID-19. 416 have died and more than 93,000 have tested negative. These figures rise each day as state health officials release new data on tests, hospitalizations and fatalities.

The Capital-Star is one of many news outlets tracking COVID-19 in Pennsylvania in a series of data visualizations. Along the way, we’ve had to make editorial judgments about what kind of data to display and how to visualize it all in a way that will inform the public responsibly.

We’ve tried to make decisions about our graphics in consultation with public health experts and epidemiologists. We’ve also been lucky to get lots of helpful feedback from readers. We compiled some of our most frequently asked questions here to give some insight into our decision making, which stands to evolve as the pandemic progresses. 

Why are you using a logarithmic scale?

For the first month of the pandemic, we charted the total number of Pennsylvania cases on a basic line graph with a linear scale. The result was a curve that looked like a hockey stick.

We changed the Y-axis of that graph to a logarithmic scale on April 6. That made the curve display as a straight line, still on an ever-increasing trajectory. 

ugh
The same data displayed with a linear y-axis (left) and a logarithmic y-axis (right).

The COVID-19 outbreak began in Pennsylvania on March 6 with two confirmed cases; a month later, the case counts were past 10,000. Datasets with a wide range of values like that are prone to distortion.

Logarithmic scales help us subvert that problem. As the data journalist John Burn-Murdoch explains in this video for the Financial Times, logarithmic scales account for the natural growth pattern of viruses like COVID-19, which always grow exponentially. 

Since our graph tracks the total number of cases to-date, we’ll never see the number of total cases decrease, even as people recover (we don’t have data to tell you how many people have recovered from COVID-19 and how many are still contagious or symptomatic.) 

Eventually, the chart will also make it easier to see how the number of new cases slows down over time. 

Some readers have suggested that the new graph makes the threat of COVID-19 look less severe. For that, we’ll defer to Burn-Murdoch, who put it this way in his video: “We’re not trying to play down the rate at which it increases — we’re trying to emphasize that this exponential spread is something you see everywhere.”

Why not adjust your data based on population?

One of the first pieces of feedback we started hearing on our data dashboard went something like this: “Your map showing the number of cases in Pennsylvania just shows which areas in the state are the most densely populated. Why not adjust the cases to show how many cases there are relative to the population?”

We posed this question to Amanda Makulec, a public health professional who specializes in visual data communication. She warned against limiting analyses to population-adjusted data while there are still a lot of factors affecting our COVID-19 case counts. 

For starters, testing is still limited in Pennsylvania and nationwide. Counties that have more robust medical systems and testing capacities may appear to have higher rates of COVID-19 than more rural, sparsely populated counties simply because more of their residents have been tested.

Cindy Prins, a professor of epidemiology at the University of Florida, also pointed out that population-adjusted rates don’t account for population density or other factors that can affect transmission. 

As the pandemic progresses, and we have testing data for a wider swath of the population, we may decide to display population-adjusted case counts in addition to the raw data. Until then, we’re following the advice of Makulec and Prins to give readers the information that is most relevant to them, in the simplest way possible.

“If I see that there are a number of cases that are near me in my county, that may give me greater pause before making more trips to the grocery store or doing more outside of my home,”  Makulec said. “It’s certainly imperfect … but I think that’s the more accurate and actionable information.”

Why aren’t you displaying new cases per day?

At the time we write this, we don’t have a chart that shows how many new cases Pennsylvania health officials are reporting each day. Public health experts say that this data point will be the one that determines when we can return to normal life. 

There’s a simple explanation for this one: we don’t have the manpower to track all the COVID-19 data we get every day. Each of our graphics requires manual data input from our staff, including on the weekends. If we find a way to make that work more efficiently or decide we want to invest time in another graphic, a chart showing new cases per-day would be one of the first we roll out. 

Why not display an infection rate?

We’ve also had readers ask us if we could calculate an infection rate each day, based on how many tests reported to the Department of Health come out positive.

Following the advice of public health professionals, we’re going to leave these kinds of calculations to the experts. We would hate to publish a crude calculation based on incomplete testing data that makes the virus seem more or less dangerous than it is. We’ve declined to calculate and publish daily fatality rates for the same reason. 

I have another question that wasn’t answered here.

You can email us at [email protected] with questions about the Capital-Star data visualizations. We also recommend these articles for further reading, all of which we’ve consulted at some point while designing and maintaining our data visualizations: