An Investigation of the CDC’s “Excess Deaths Associated with COVID-19” dataset

Posted by:

|

On:

|

by : Travis Keyser

In a very spirited year where quite a bit of noise was injected into the science of determining mortality in the United States, I wanted to dig a bit deeper into the most reputable (or at least I hope most reputable) dataset concerning mortality in the United States.  Since this data is almost exclusively referred to when making policy/lockdown/travel restriction decisions based on science, I was curious to see if the observed mortality data followed any discernible patterns that could help me understand how some of those decisions are coming to pass.

I’m aware that many will view my attempt to quantify and rationalize morbidity as heartless and cruel.  My sincerest sympathies go out to those who have lost loved ones in this trying year.  I wish I knew a way to get these points across without boiling it down to numbers…but I do not.  These numbers are being used in this same manner to make extremely important decisions in our country, so it is necessary to present them in this same manner. 

To be clear, this examination is not an effort to prove/disprove the existence of any conspiracy theory or try to discount the effects that the SARS-CoV-2 virus has had on American society and culture.  What I will try in earnest to do is to capture the scope of the issue so that consumers of this data can hopefully put it into a context that each can relate to a bit more personally.  

To start, the assertion that 350,000 souls were lost in a year due specifically to one particular cause lacks context at best and is meant to manipulate sentiment at worst.  For the record, I do not believe the latter although I make room for its existence.  What I believe is possible, or more likely probable, is that when faced with the optics of surging death rates due to COVID, well-intentioned people of influence and power flinched in the absence of clear data and historical context.  So how does one strip away the noise and tune in to the signal?  The best way I know how is to take a look at overall mortality in the United States and compare it to what was expected.  In other words, if at the outset of 2020 we predicted the mortality of a certain percentage of our population given historical norms and maybe a few other known external factors, at what point do we consider the actual observed mortality of 2020 vastly different than what was expected?  I believe if one can effectively determine this then the scope of the issue can be evaluated; context and perspective can be formed in order to more rationally assess this situation.

Some basics:

  • All of the data in this analysis is taken from the following dataset readily available and confirmable on the CDC’s website.  Here is the link for those interested in doing so.  “Excess Deaths Associated with COVID-19” 
  • For reasons unclear to me, the science of mortality uses deaths per 100,000 population as a yardstick when comparing time periods, geographies, age groups, etc. so I will do the same here
  • Data in this dataset is collected/reported weekly.  As such all rates are given as weekly. (e.g. a weekly expected or observed mortality rate of 18.1 means that if we started off a certain week with 100,000 people of a given population, we would expect/observe 18.1 deaths in that week)
  • I use trailing averages quite a bit in this analysis.  Trailing averages can be useful in muting the effects of wild near-term swings in datasets rather than looking at strict calendar weeks/months/years.  One weakness they have is that the data need to ‘bake’ for the trailing period to be relevant.  Say you have a dataset that starts on 1/1/2017.  If you desire to look at a trailing 28-day measurement of that data, your analysis needs to start on 1/28/2017 since that would be the first day where 28 full days of data are available to calculate your trailing average and have it be relevant to all that follow…anything before that will not have the fully-baked effect of 28 full days.  Similarly, if you wanted to look at a trailing year for the same dataset, you’d need to start your view of the analysis on 1/1/2018.
  • The CDC has chosen to separate the data in a couple of useful, but non-standard geographies.  They have separated New York City’s data from New York State…so NYC essentially appears as its own state.  In addition, Puerto Rico’s data is included in this overall dataset.
  • I have assigned regions of the US as best I could in order to try to determine more macro effects on geographies rather than look at all individual states separately.  I have no doubt that in some instances this can cause the appearance of trying to sway the data.  This is not the intent, but I have not done a full dive on if/how this could mislead any of my analysis, so I welcome any suggestions to make this more clear/concise.

With that let’s dig in.

The Observed Historical Death Rate (downloaded from https://wonder.cdc.gov/)

The CDC reports observed deaths for each year.  The CDC has confirmed numbers for the 10 years spanning 2009-2018.  This is the graph of that average weekly death rate over time.  As you can see, for the decade ending 2018 the average weekly death rate steadily climbs about 1.0% each year.

Given this ten-year historical trend one could logically assume that this rate would continue to grow at the same 1.0% annually for 2020, 2021…(although the period from 2017-2019 seems to be levelling off…which could be critical in context).  

Why this point is extremely important – In its “Excess Deaths Associated with COVID-19” dataset, the CDC makes mortality predictions on a weekly basis going all the way back to January of 2017 (I’ve asked the CDC to provide these projections for further back in history, but have not yet been afforded them…so have to do with this smaller dataset for now).  These predictions are the basis by which the CDC determines whether that particular week’s eventual observed/actual mortality was within/outside expectations.  It follows that inaccurate or ill-informed predictions have the ability to force inaccurate conclusions and bring with them dire consequences.  What might appear to be innocuous changes to these predictions can have massive downstream effects.

So given the history of the previous ten years, when we look at the CDC’s predictions something odd stands out.  On or about Christmas 2019 those weekly death rate prediction numbers stop growing…and start shrinking.

Take a look at this chart derived directly from the CDC’s “Excess Deaths Associated with COVID-19” dataset

The black line is the CDC’s best guess on weekly death rates (trailing year to smooth out seasonality).  The red line is the upper bound of that predicted death rate.  Think of the red line as the tripwire that sounds the alarm that something is afoot that needs to be taken into consideration (I’ve asked the CDC on some guidance as to how this upper bound is calculated, but as yet have not received an answer…it would appear from the data that the calculated upper bound is simply ~8% more than the predicted rate at any given moment).  If/when the observed death rate crosses this upper bound, the death rate has far exceeded what was predicted.  This graph clearly shows after years of steady growth that starting around Christmas of 2019 the CDC believed that the death rate would reverse course and start retreating at about the same rate (-1.0%) as it was previously growing.  I reached out to the CDC on this and received a not-so-convincing explanation which I will share here.

“There are a few things to consider.  First, the difference in expected counts are only about 1-2% lower in 2020 than in 2019.  Second, the 2017/2018 flu season was exceptionally bad and the 2018-2019 flu season had fewer deaths than average. Population growth may also play a small part.”

The statement “…the difference in expected counts are only about 1-2% lower in 2020 than in 2019” is troubling to me.  It implies that these rates are essentially the same and shouldn’t be seen as being drastically different year over year.  But, about 3 million people die in the US every year.  Assuming that the CDC predicts 2% less are going to die in a year as opposed to 2% more represents a possible 120,000 death gap.  I would argue that this is not insignificant.  Additionally, the references to the 2017-2018 & 2018-2019 flu seasons are relevant, however I draw an exact opposite conclusion to the one that is being inferred by my contact at the CDC.  Let’s take a look at the observed death rate overlayed on this graph to get a better idea of what I am referring to:

The light blue line is the observed (actual) weekly death rate (trailing year) for Jan 2018 through late December 2020.  The 2017-2018 flu season (incidentally a horrible flu season that briefly exceeded pandemic levels similar to those experienced recently) referred to by my contact at the CDC starts the graph out on the left.  The observed death rate (light blue) coming out of that bad 2017-2018 flu season tracks above the predicted (black line) death rate.  Then something strange happens.  During the exceptionally light 2018-2019 flu season, the observed death rate dives well below the predicted death rate (about 2%) and stays there for over a year!

What my contact at the CDC (who was actually extremely helpful and kind) uses to justify a reduction in predicted death rate (the light 2018-2019 flu season) should actually be used to ratchet the death rate prediction upwards…not downwards.

Here is where I start to feel like an unfeeling robot in speaking about deaths, but again…we need to apply sound logic whenever we can.  These numbers are used everyday in the media with possibly the best of intentions to draw attention to their seriousness…or…more sinisterly…to scare the public into a narrative.  Let’s try to look at them in a rational, objective manner.

Here goes…

A light flu season (exceptionally light in 2018-2019) has one large downstream echo.  The elderly/weak population that was inherently spared by the light 2018-2019 flu season is now one year older during the 2019-2020 flu season and presumably one year weaker.  If just a regular (run-of-the-mill) version of the flu were to hit in 2019-2020, one should come to expect those who were barely spared during 2018-2019 to succumb in 2019-2020 in addition to those that would normally be susceptible to a normal flu virus in a normal flu season.  Fan the flames with the fact that the CDC publicized its belief that less people would die than in the previous years leading up to that and you have a full conflagration.  Add to that a heretofore unexperienced virus with a penchant for the elderly/weak, and you have a confluence of factors that can shake a society to its core if it is operating in a vacuum and only concerned with the tallying of inevitable deaths as the signature of a pandemic.

But let’s take a look at the above graph and extrapolate the predicted and upper bound trends through January 2020 at the same rate as they had proceeded in the previous years.  While we would be operating outside of our predicted norms (blue observed death rate line crosses the extrapolated predicted deaths (pink line) about July 2020, we would still…to this day…be operating inside of the extrapolated upper bounds (yellow line).

This is where we come to the more subjective portion of my analysis.  Everything up to now has been based in hard fact easily verified by the aforementioned CDC dataset.  What follows is part logical conjecture on my part and part educated opinion.  While it makes sense to me, I’m always interested in challenges to my conclusions and welcome any and all who can offer other constructive methods to either debunk, correct or enhance my data and conclusions.

Our PCR (Polymerase Chain Reaction) testing and testing procedures for SARS-CoV-2 are flawed at best…pointless at worst.  

  1. For reasons I outline below, I believe a large majority of those being counted as COVID deaths died with a positive PCR SARS-CoV-2 test…not necessarily of COVID itself.  For this reason I chose to study overall mortality rather than trust the reported causes of death.  We’ve all seen the stories about hospitals getting paid more for COVID treatments…fine…I don’t believe this is a conspiracy, but I think it is possible that this was an overall well-intentioned, but misguided attempt to heighten awareness of the virus.  The unfortunate result has been to blur the story and distort the facts.  In the end, you can’t (or at least I don’t think you can) hide overall deaths or call them something you don’t want to call them.  Death is a binary condition…and not subject to interpretation.  
  2. PCR is a DNA amplification technology.  PCR technology takes a sample of DNA and copies it, then copies that, then copies that….(etc.) until you have enough of that sample to test for the presence of whatever you are looking for.  I believe the lowest level of amplification available for prevalent SARS-CoV-2tests is 33 cycles…with some as high as 40 cycles.  This level of amplification is extremely useful when trying to determine if a patient still has copies of the AIDS virus in their blood…less so if the purpose is to find sick people with SARS-CoV-2 as we are essentially amplifying everything in the background.  Some data suggests that any amplification cycles over 17 are essentially useless.  This article better articulates the nuances associated with PCR testing https://www.medpagetoday.com/infectiousdisease/covid19/90508…but PCR SARS-CoV-2 testing nevertheless is far from the binary determination that most people believe it to be.  
  3. Never in history have we used a PCR diagnostic test (a test designed to see if sick people are sick) to screen a large healthy population.  I’d be interested to see what our results would be if we tested an entire healthy population in the same manner for, say…the flu?  I think there is a solid chance that those results would be comparable to what we are seeing with current SARS-CoV-2 testing (would we see asymptomatic carriers of the flu?…or maybe more to the point, PCR flu-positive subjects, that don’t have the flu?).

The point that follows is true opinion and not backed by anything other than gut feeling and a loose comparison of death rates between states that took extreme SARS-CoV-2 mitigation measures and those that did not (I looked at the US Northeast vs. the US Southeast and there is a stark comparison). 

Northeast

Southeast

I believe that our efforts to mitigate the spread of SARS-CoV-2 have done one thing and one thing only.  They made those leaders and people in power feel like they took action….that is it.  In the face of what they were being told was armageddon, our leaders flinched.  Once they flinched, they injected a massive amount of fear into the system.  That fear became so great and so pervasive that it still has people looking for bad news in everything.  Make no mistake, fear is a powerful (perhaps the most powerful) motivator.  It is this fear that has many entities (school systems, states, cities, etc.) scouring data for the bad news and constantly changing the metrics to fit a narrative.  Many have suggested only a return to normalcy once SARS-CoV-2 is 100% eradicated.  This is a dangerous, unreasonable and most likely unattainable goal.  The flu (despite the existence of myriad vaccines) is still with us…Sars1 is still with us…swine flu, bird flu, MRSA (remember that death sentence from the early 2000’s?) are all still active in the population and take lives every year.  So what we really need is to arrive at an acceptable death rate for COVID.  I know how that sounds, but we all calculate acceptable death rates subconsciously for all human tragedies and live our lives accordingly:  

  • 650,000 / year die of heart disease – yet most Americans are overweight and technically unhealthy – staying in shape is the one tried and true way to tip the scales in one’s favor here (and for staving off COVID for that matter)
  • 600,000 / year die of cancer – some seemingly unavoidable yet Americans still smoke, drink and engage in known cancer-causing behaviors without much thought to this inevitability
  • 170,000 / year die of unintended accidents (automobile crashes, falls, drownings, choking, etc.) ~500 people per day…but most people go about their days unaffected by this reality
  • 50,000 / year kill themselves – and it is my belief that when the dust settles on the COVID-era mortality statistics…this number will see a sharp increase and thus must be accounted for in the unintended consequences of COVID mitigation measures…along with increased drug overdoses, alcohol abuse, domestic/child abuse, etc.  SARS-CoV-2 mitigation strategies have been sanctified by many, but if the true end game was always to preserve human life…let’s just say the road to hell is paved with good intentions.

In closing, I hope this helps to put a bit more context on the issue at hand.  Yes, SARS-CoV-2 is a real virus that causes a real disease…COVID.  Yes, COVID has killed many.  Yes, COVID has contributed to the fact that more people died in 2020 than we would have predicted in the time that it has been with us…but No, the sky is not falling.  By my highest estimate, the number of excess deaths in 2020 (due to all causes…inclusive of those unintended deaths caused by SARS-CoV-2 mitigation measures) hovers around 100,000.  To be sure this is a lot…however the vast majority of these individuals were over the age of seventy and not in great health…and a growing percentage of these that are under 70 are due to non-natural deaths (overdoses, suicides, homicides, etc.).  I can indirectly prove this now, but am awaiting a request made to the CDC to bundle their data a little differently to allow me to make this point more directly and concisely.  

I understand that there are concerns about the downstream health consequences of COVID.  These are valid concerns.  These concerns are just as valid as the downstream consequences of the flu virus, or any other virus that attacks multiple systems in the body.  These concerns never drove major political policy decisions in the past…I’m curious as to why we are starting now.  Direct correlations of downstream effects of viruses on human beings are tough to draw…although we know they exist.  Direct correlations of extreme social, economic and cultural measures imposed on a society are a bit easier to make.  Suicide rates, overdose rates, domestic violence rates have all risen.  In addition, we have an entire generation of school-aged children that have essentially lost a year of their youth and most of the human experiences/attainment of knowledge that go along with that…the belief that this won’t come home to roost in myriad different negative ways is myopic.  Life is a constant cost-benefit analysis and there are no risk-free decisions.  It is my hope that by using data and context, I can help individuals/families/leaders do their best to make the best, most informed, most rational decisions possible and we can harken a return to normalcy sooner than later.

I have published this dashboard for the purposes of public interaction.  I implore you to take a look at how the different regions/states have fared compared to others.  You can adjust the trailing period to see how seasonality affects the cadence of our mortality (364 days is a trailing year (52 weeks x 7 days), 728 (104 weeks x 7 days) is trailing two years, 28 (4 weeks x 7 days) days is usually interesting to look at as well as 91 days (13 weeks x 7 Days or trailing quarter)).  

If you can get past the sheer morbidity of it, it’s actually pretty fascinating.

https://app.powerbi.com/view?r=eyJrIjoiOGY3ZjgzNjUtOTAwYy00MTY0LThhNzYtMzQzN2QxNGI1M2U5IiwidCI6IjljNzViZDA4LThmYTMtNDdiMC1iNjIwLWZlMmI1NjFmZmM1NCIsImMiOjF9