The CDC Publishes an Extremely Flawed Study on SARS-COV-2 & Diabetes
In January 7, 2022, the Center for Disease Control and Prevention published a case study (archive) on the connection between SARS-COV-2 infection and diabetes. As clearly stated in the title, the study analyzed the incidence rate of diabetes >30 days post-infection among individuals under 18. The study concluded that <18 individuals exposed to SARS-COV-2 had up to a 2.5x greater risk of having diabetes and the CDC subsequently posted its findings on Twitter to encourage vaccination.
A 2.5x higher risk sounds scary, right? However, upon reading the actual study, I had several problems.
In epidemiology, there is a difference between absolute risk and relative risk. The former refers to the rise or decline of risk in percentage points whereas the latter measures how many times the risk there is upon exposure. This is something I already covered in a previous article, but for those who don't want to jump to another article, here's the example I used:
The "up to 2.5 times more likely" phrase from the CDC's tweet refers to its analysis of the IQVIA database. Of the 80,893 individuals who tested positive, 68 of them were diagnosed for diabetes. The incidence rate is therefore 0.08%. In contrast, of the 404,465 individuals who tested negative, 132 of them were diagnosed for diabetes for a 0.03% incidence rate.
While the relative risk is indeed 2.5, that does not tell the entire story. You need to have both relative and absolute risk of which the latter equals to 0.05%. In laymen's terms, exposure to SARS-COV-2 will increase the risk for diabetes by 0.05 percentage points. That is an extremely low figure. This is also the case when we look at it in terms of number needed to harm (NNH) which will clock in at 2,000. To explain what this number means succinctly, you will have to infect 2,000 under 18 individuals until you get one case of diabetes. That is incredibly large.
However, this is only referring to the IQVIA database. The study also analyzed data from HealthVerity which yielded different numbers. 1,120 of the 280,767 positive individuals had diabetes whereas 853 of the 281,072 negative individuals had diabetes. The incidence rates would therefore be 0.4% and 0.3%, respectively. Rather than a 2.5 greater risk, the relative risk is at a smaller 1.3. While the absolute risk is larger (0.1% > 0.05%), the number is still miniscule and the NNH is still extremely large at 1,000 (the actual number is bigger because I rounded).
What did the CDC do on Twitter? It selectively published the 2.5 relative risk from just the IQVIA database with the possible intention for maximum sensationalism. Did it ever mention that HealthVerity showed a 1.3 relative risk? No. Did the CDC ever mention that along with the 2.5 relative risk, the absolute risk was 0.05% and the NNH was 2,000? It did not.
This was not the only major flaw. The databases only counted individuals who got tested and tested positive as SARS-COV-2 positive. However, there is a likelihood that a significant number of people had the virus, but never sought testing. What the study should have done was to incorporate seroprevalence which estimates the percentage of populations (and sub-populations) who had SARS-COV-2 via antibody testing.
As a result, the number of SARS-COV-2 positive individuals would end up larger and lower the incidence rate for diabetes. To what degree? We do not know, but we cannot discount the possibility that seroprevalence can yield significantly different numbers.
Another issue was the lack of sub-population analysis. Only age and sex were looked at. However, there are several confounding variables that may affect the incidence rates such as living situations, wealth, and food accessibility. Could most of the increased risk for diabetes post-COVID infection be attributed to lower socioeconomic status? It is definitely possible, but unfortunately, the CDC did not perform such an analysis to prove or disprove the theory.
We also have no idea on whether, for instance, the wealth distribution among the SARS-COV-2 positive and negative groups is identical or not. If the distributions of difference socioeconomic factors are not the same across both groups, then that will undoubtedly skew the incidence rates.
To be fair to the study, it did acknowledge some of the limitations I stated earlier in the discussion section. However, acknowledging that your study has limitations does not automatically make it publishable. If I were the authors, I would have attempted to analyze the IQVIA and HealthVerity data with seroprevalence. On top of that, I would have also analyzed the effect of socioeconomic factors on the incidence rates. Lastly, if I were to share the results on Twitter, I would have shared the absolute risk and NNH numbers along with relative risk instead of selectively going for the most "clickbaity" number.
You can also read Dr. Sinay Prasad's article on the same study in Substack. Some of our complaints overlap, but he provides some additional points that I did not make here.