Long COVID Persistence and Surveillance Gaps Across 58 US Hospitals, 2026, Tian

rvallee

Administrator
Staff member
Long COVID Persistence and Surveillance Gaps Across 58 US Hospitals
Jiazi Tian, MSc; Alaleh Azhir, MD, MSc; Matthew Decaro, MSc; Ngan Chau, BS; Jonas Hügel, PhD; Michele Morris, BA; Jingya Cheng, MB; Pedram Fard, PhD; Ingrid V. Bassett, MD, MPH; Douglas S. Bell, MD, PhD; Elmer V. Bernstam, MD, MSE; Shyam Visweswaran, MD, PhD; Jeffrey G. Klann, PhD; Shawn N. Murphy, MD, PhD; Hossein Estiri, PhD

Key Points

Question
What is the true burden of chronic disease following COVID-19, and why does current surveillance fail to capture it?

Findings
In this cohort study of 457 950 patients with COVID-19 across 58 hospitals, validated computable phenotyping identified postacute sequelae of SARS-CoV-2 infection in 16.28% of cases, 2-fold higher than diagnostic code–based surveillance. Of identified manifestations, 89.31% represented chronic conditions, with prevalence increasing through mid-2024.

Meaning
These findings suggest that approximately 1 in 6 patients with COVID-19 develops postacute sequelae, predominantly chronic conditions currently invisible to surveillance systems, representing an accumulating rather than resolving health care burden.


Abstract

Importance
Surveillance of postacute sequelae of SARS-CoV-2 infection (PASC) depends on diagnostic coding systems that capture fewer than one-half of affected individuals, rendering millions invisible to health systems and policymakers.

Objective
To quantify the gap between true PASC burden and diagnostic code–based estimates, determine the proportion representing chronic disease, and characterize organ system heterogeneity and temporal trends across diverse populations.

Design, Setting, and Participants
This retrospective cohort study used electronic health record data from 58 hospitals and affiliated clinics in 4 US regions, from 2017 to 2025. Adults (aged ≥18 years) with laboratory-confirmed SARS-CoV-2 infection or a COVID-19 diagnosis code were included. A custom artificial intelligence algorithm, the Precision Phenotyping for Research Cohorts (P2RC), was implemented using federated infrastructure.

Exposure
Laboratory-confirmed SARS-CoV-2 infection or COVID-19 diagnosis code.

Main Outcomes and Measures
The primary outcomes were PASC prevalence, the proportion classified as chronic conditions, organ system distribution, and temporal trends from 2020 to 2024. χ2 Tests were used to assess organ system heterogeneity across regions, and negative binomial regression was used to model quarterly temporal trends, yielding incidence rate ratios (IRRs) with 95% CIs.

Results
In this cohort study of 457 950 COVID-19 cases (mean age, 52.05 years; 275 107 [60.07%] female), the P2RC algorithm identified 74 560 PASC cases (16.28% overall; 28 585 [18.58%] in New England, 978 [19.55%] in Southeast Texas, 10 534 [22.69%] in Southern California, and 34 463 [13.64%] in Western Pennsylvania), more than 2-fold higher than the proportion identified by code-based surveillance (<7%). Of 883 International Statistical Classification of Diseases, Tenth Revision, Clinical Modification codes associated with PASC, 594 (67.27%) represented chronic or potentially chronic conditions. Of 74 560 patients with PASC, 66 587 (89.31%) developed chronic conditions requiring ongoing clinical management; this represents 14.54% of the total number of 457 950 patients with COVID-19. Substantial organ system heterogeneity was observed (χ2 = 2504.73; P < .001): New England demonstrated thyroid-predominant endocrine patterns, while Southeast Texas, Southern California, and Western Pennsylvania showed metabolic-predominant profiles. Negative binomial regression revealed increasing PASC prevalence through mid-2024 (IRR per quarter, 1.01 [95% CI, 1.00-1.01; P < .001] in New England; 1.00 [95% CI, 1.00-1.01; P < .001] in Southern California; and 1.02 [95% CI, 1.01-1.02; P < .001] in Western Pennsylvania), indicating an accumulating rather than resolving burden.

Conclusions and Relevance
In this cohort study, approximately 1 in 6 patients with COVID-19 developed PASC, and 89.31% of these patients had at least 1 chronic condition. Current diagnostic coding captured fewer than one-half of the cases, obscuring a substantial chronic disease burden. The persistently increasing prevalence through 2024 indicated an accumulating health care burden requiring investment in surveillance infrastructure and integrated care pathways.
 

News Release 27-May-2026

Long Covid burden continues to grow, doubling current surveillance estimates, multi-hospital study shows​

Mass General Brigham-developed AI tool identified over 10 million cases undetected by existing diagnostic systems

Peer-Reviewed Publication
Mass General Brigham


FacebookXLinkedInWeChatBlueskyMessageWhatsAppEmail

The true toll of long COVID may be double that of current estimates and hidden from current surveillance systems that rely on capturing diagnostic codes, according to new research led by Mass General Brigham. Investigators used a novel AI algorithm to comb through medical records of nearly 460,000 patients with COVID-19 across 58 U.S. hospitals, finding approximately 1-in-6, or roughly 16%, developed long COVID. These rates, which translate to more than 18 million Americans, are twofold higher than current estimates and reflect the growing cumulative burden of chronic conditions following COVID-19 infection. Results are published in JAMA Network Open.

“Over 10 million people with long COVID would go entirely undetected by the diagnostic code that health systems and policymakers rely on to track the disease burden,” said study corresponding author Hossein Estiri, PhD, a faculty member in the Mass General Brigham Department of Medicine. “The figures we uncovered are almost certainly an undercount.”

Current diagnostic coding, including the ICD code U09.9 designated for post-COVID conditions, captures fewer than 7% of patients with long COVID.

Mass General Brigham researchers deployed a novel “precision-phenotyping” algorithm they designed specifically to identify long COVID in longitudinal electronic health records by analyzing temporal sequences of clinical events from hundreds of thousands of COVID-19 patients. The algorithm was previously validated to identify cases of long COVID as a diagnosis of exclusion, which identifies conditions that appeared after COVID-19 infection and cannot be explained by preexisting conditions already in a patient's medical history.

Researchers analyzed electronic health records from 457,950 patients who had previously tested positive for COVID-19 across four U.S. regions: New England, Southeast Texas, Southern California and Western Pennsylvania. They identified long COVID in 16.3% of patients overall, with rates ranging from 13.6% to 22.7% across regions. Across the full study cohort, 14.5% of COVID-19 patients (66,587 individuals) developed chronic conditions requiring sustained clinical care. The study also uncovered regional variations of long COVID clinical manifestations, such as dramatically different rates of prediabetes – an emerging sequelae of long COVID – across various parts of the U.S.

Contrary to the assumption that long COVID is a legacy of early waves of the pandemic, the researchers also found that cumulative prevalence continued to increase across all regions studied. This indicates the virus continues to act as a catalyst for new, long-term chronic health conditions impacting different systems in the body. Statistical modeling showed significant quarterly increases in New England, Southern California and Western Pennsylvania, with trends pointing to continued growth over the next decade if current patterns persist.

“This work demonstrates how longitudinal clinical data in a health system can be structured and analyzed to support more consistent identification of complex post-viral conditions,” said Shawn Murphy, MD, PhD, study co-author and Chief Research Information Officer for University of Washington. “There is significant potential for clinical AI when it is designed for public health and integrated across real-world care settings.”

The researchers note that their findings do not include undocumented infections, which have become the majority since widespread testing ended, and exclude patients without longitudinal medical records. These limitations suggest the overall disease toll of long COVID may be even higher.

“These patients are not absent from clinical care; they are absent from the diagnostic code that would identify them as long COVID patients,” said lead study author Jiazi Tian, MSc, a data scientist in the Clinical Augmented Intelligence Group at Mass General Brigham. “The cardiologist seeing new dysautonomia, the endocrinologist seeing new metabolic disease, the neurologist seeing unexplained cognitive complaints — some of these presentations are long COVID arriving without the label that would connect them to a COVID-19 infection.”

"This study demonstrates how hospitals can leverage AI to help fill surveillance gaps that public health agencies are no longer tracking. What excites me most is what can come next with this new surveillance data,” said Estiri. "Once we can distinguish different clinical and organ-specific manifestations of long COVID, we gain the ability to launch new trials and test targeted treatments for the right patients.”

Authorship: In addition to Estiri, Murphy and Tian, research co-authors included Alaleh Azhir, Matthew Decaro, Ngan Chau, Jonas Hügel, Michele Morris, Jingya Cheng, Pedram Fard, PhD, Ingrid V. Bassett, Douglas S. Bell, Elmer V. Bernstam, Shyam Visweswaran, and Jeffrey G. Klann.
Disclosures: None
Funding: This study was supported by the National Institutes of Health/National Institute of Allergy and Infectious Diseases (R01 AI165535) and the National Center for Advancing Translational Sciences (U24 TR004111). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Paper cited: Tian J et al. “Long COVID Persistence and Surveillance Gaps Across 58 US Hospitals” JAMA Network Open DOI: 10.1001/jamanetworkopen.2026.14909.

###

About Mass General Brigham

Mass General Brigham is an integrated academic health care system, uniting great minds to solve the hardest problems in medicine for our communities and the world. Mass General Brigham connects a full continuum of care across a system of academic medical centers, community and specialty hospitals, a health insurance plan, physician networks, community health centers, home care, and long-term care services. Mass General Brigham is a nonprofit organization committed to patient care, research, teaching, and service to the community. In addition, Mass General Brigham is one of the nation’s leading biomedical research organizations with several Harvard Medical School teaching hospitals. For more information, please visit massgeneralbrigham.org.


Journal​

JAMA Network Open

DOI​

10.1001/jamanetworkopen.2026.14909

Method of Research​

Data/statistical analysis

Subject of Research​

People

Article Title​

Long COVID Persistence and Surveillance Gaps Across 58 US Hospitals

Article Publication Date​

27-May-2026

COI Statement​

Dr Hügel reported receiving grants from the German Academic Exchange Service and the German Research Foundation during the conduct of the study. No other disclosures were reported.
 
I don't claim to understand all the details but I did like this quote (line breaks added):

A natural question arising from these prevalence estimates is whether health systems should already be experiencing overwhelming demand from patients with PASC.

We contend they are, but this burden manifests as unexplained increases in chronic disease management rather than as a discrete, labeled condition.

Patients with chronic postviral conditions present to primary care with fatigue, to cardiology with dysautonomia, to endocrinology with new-onset diabetes, and to neurology with cognitive complaints, without the diagnostic code connecting these presentations to antecedent infection.

Systematic underascertainment in surveillance systems does not mean these patients are absent from clinical care; rather, clinicians may recognize and manage post–COVID-19 conditions under alternative diagnostic codes, rendering the PASC burden invisible to population-level surveillance while remaining visible at the point of care.

This fragmentation across specialty silos impedes both epidemiologic surveillance and coordinated clinical management, and may partly explain observed postpandemic increases in diabetes, cardiovascular disease, and fatigue syndromes.

It makes sense to me that many patients are either undiagnosed or have various other diagnoses and that's why they don't show up as Long Covid patients.

And I just noticed the phrase "fatigue syndromes" in the last line. I guess that includes ME/CFS?
 
Last edited:
Back
Top Bottom