Effective Data Handling To Improve Patient Outcomes In The Era Of COVID-19

Dr. Piyush Mathur is the founder of BrainX, a collaborative platform for physician researchers and innovators that has come together to create the next generation of data handling and AI applications for healthcare. Dr. Ashish Khanna is the founding partner of BrainX. (More information at: www.Brainxai.com.)


Among its many impacts, COVID-19 has spawned a plethora of early data and literature. That which is not high-quality may hinder progress toward our understanding of the disease. Critical care and, more broadly, perioperative medicine are clinical arenas that generate massive volumes of data. As we routinely care for patients with COVID-19 in those settings, these data hold promise to further our understanding of the disease.

Critical care data - where are we today?

A PubMed search at the time of writing for “COVID-19” yielded more than 16,000 results, of which nearly 14,000 have been published since April 1, 2020. Although some randomized controlled trials (RCTs) exist, nearly all these publications are observational.

The combination of proliferative observational findings, amplification via social media, and pressure to urgently conceptualize optimal treatment modalities can promote the well-intentioned but unfortunate spread of misinformation and even disinformation.5 Clinical adoption of treatments despite lack of RCT evidence can lead directly to patient harm. Many proposed therapeutics have significant adverse effects, which can be particularly detrimental to patients at baseline risk of morbidity and mortality from COVID-19 (i.e., the elderly and those with cardiac comorbidities).6 Unnecessary use of these drugs may also create downstream problems, such as shortages for approved indications (e.g., hydroxychloroquine for lupus).

RCTs, while challenging to design and implement, are a robust and effective tool for discerning between harmful and beneficial therapies. In contrast, a recent non-randomized clinical trial observed decreased SARS-CoV-2 viral load in patients treated with hydroxychloroquine after excluding six patients from the treatment group.7 If these six patients had been included, the treatment group would have demonstrated greater harm than benefit, as the number needed to harm with hydroxychloroquine would have been six instead of zero (19.2% in hydroxychloroquine vs 0% control).

In the context of a novel pandemic disease, the time lag associated with RCTs – stemming from institutional review board approval, study design, funding, enrollment, time needed to treat, analysis, etc. – requires alternative research approaches that still generate reliable findings from observational data. One such approach is data registry analysis, as was undertaken by Mehra et al.  Using a registry of over 96,000 hospitalized patients, the authors were able to show that hydroxychloroquine was associated with no benefit, more ventricular arrhythmias, and an increased risk of in-hospital death.8 However, the validity of the underlying data and methodology have subsequently been called into question, which led to a high profile retraction. This example highlights the importance of registry data quality. While awaiting RCTs, increasing the availability of authentic, clean, and large datasets may be the key to rapidly increasing scientific understanding of the disease and its treatments.

The evolution of COVID-19 registry data

With the explosion of both raw and processed COVID-19 data, there is a commensurate need for systems to facilitate entry, storage, access, and processing of this information. In turn, accurate and reliable data can be used for research, operations, and predictive modeling of important patient-centric outcomes. Well-established large critical care datasets (e.g., MIMIC-III), while valuable, cannot immediately meet COVID-19 operations and research needs. Furthermore, public and commercial critical care datasets are not updated in real time.

The Viral Infection and Respiratory Illness Universal Study: COVID-19 (VIRUS) Registry is a collaborative effort intended to meet these needs.9 Investigators at the Mayo Clinic and Boston University, in partnership with the SCCM Discovery Research Network, have aimed to create a registry of all eligible adult and pediatric patients hospitalized with suspected or confirmed COVID-19. These data will support the conduct of a cross-sectional, observational study. Another tangible aim for this work is near-real-time observational comparative effectiveness analysis to determine effective treatment strategies and/or provide meaningful hypotheses for future clinical trials. At the time of writing, the VIRUS Registry contains data from over 6,000 patients contributed by more than 500 collaborators. A data dashboard is available online: https://sccmcovid19.org.

The Extracorporeal Life Support Organization (ELSO) has long maintained an international registry of patients who receive extracorporeal membrane oxygenation (ECMO) modalities. While ECMO is not new to critical care medicine, the use of an expensive and limited resource in a clinical situation as complicated as COVID-19 demands a deeper understanding of disease-specific outcomes. As such, ELSO created an ECMO registry specific to COVID-19 and is updated in real-time for analytics and outcome modeling. At the time of writing, data from about 1,100 patients with COVID-19 who received ECMO demonstrated a 53% survival to discharge rate. A data dashboard is available online: https://www.elso.org/Registry/FullCOVID19RegistryDashboard.aspx

Data overload in the pandemic – will artificial intelligence help?

Major challenges with COVID-19 data include accuracy in reporting, missing data, and timeliness of availability, even in some of the commonly used public datasets. Access to clinical datasets remains challenging despite calls to improve accessibility.1

As previously stated, there has been an exponential rise in potentially relevant pre-print and peer reviewed literature. For example, the recently released COVID-19 Open Research Dataset (CORD-19) includes over 24,000 research papers from peer-reviewed journals and pre-print servers (e.g., bioRxiv and medRxiv). The need of the hour is to convert some or all of these findings to information that is meaningful. To that end, natural language processing techniques have been developed and employed.2

Multiple models have been created to predict the spread of the disease and its associated outcomes, such as hospital resource utilization and death. It has been challenging to build models for prediction of mortality with ever-evolving data and sometimes incomplete datasets.3 Beyond predicting death, explainable machine learning models that describe key features, with relative ratios of importance of these features, are important for making policies to contain the spread and improve outcomes.4 Machine learning models, which have the ability to adjust to constantly-evolving data, can support rapid cycle improvement needs, be implemented universally, and scaled for every region. This includes high-demand areas, such as intensive care units.

Lessons learned

Lessons from COVID-19 overlap with those learned from prior experience with big data: we need clean, high-quality, guideline-based, regularly updated, dynamic datasets that are readily and freely accessible. We can then use machine learning and natural language processing tools to leverage and translate data into meaningful information. Needless to say, teamwork is the essence of the practice of critical care medicine. Data handling during and after the pandemic will also need the same level of collaboration between data scientists and clinicians through open platforms to get the best desired outcomes and truly help our patients.

  1. Cosgriff CV, Ebner DK, Celi LA: Data sharing in the era of COVID-19. Lancet Digit Health 2020; 2: e224
  2. Awasthi R, Pal R, Singh P, Nagori A, Reddy S, Gulati A, Kumaraguru P, Sethi T: CovidNLP: A Web Application for Distilling Systemic Implications of COVID-19 Pandemic with Natural Language Processing. medRxiv 2020: 2020.04.25.20079129
  3. Jewell NP, Lewnard JA, Jewell BL: Caution Warranted: Using the Institute for Health Metrics and Evaluation Model for Predicting the Course of the COVID-19 Pandemic. Ann Intern Med 2020
  4. Mathur P, Sethi T, Mathur A, Khanna AK, Maheshwari K, Cywinski JB, Dua S, Papay F: Explainable machine learning models to understand determinants of COVID-19 mortality in the United States. medRxiv 2020: 2020.05.23.20110189
  5. Ingraham NE, Tignanelli CJ: Fact Versus Science Fiction: Fighting Coronavirus Disease 2019 Requires the Wisdom to Know the Difference. Crit Care Explor 2020; 2: e0108
  6. Kalil AC: Treating COVID-19-Off-Label Drug Use, Compassionate Use, and Randomized Clinical Trials During Pandemics. JAMA 2020
  7. Gautret P, Lagier JC, Parola P, Hoang VT, Meddeb L, Mailhe M, Doudier B, Courjon J, Giordanengo V, Vieira VE, Dupont HT, Honore S, Colson P, Chabriere E, La Scola B, Rolain JM, Brouqui P, Raoult D: Hydroxychloroquine and azithromycin as a treatment of COVID-19: results of an open-label non-randomized clinical trial. Int J Antimicrob Agents 2020: 105949
  8. Mehra MR, Desai SS, Ruschitzka F, Patel AN: Hydroxychloroquine or chloroquine with or without a macrolide for treatment of COVID-19: a multinational registry analysis. Lancet 2020
  9. Walkey AJ, Kumar VK, Harhay MO, Bolesta S, Bansal V, Gajic O, Kashyap R: The Viral Infection and Respiratory Illness Universal Study (VIRUS): An International Registry of Coronavirus 2019-Related Critical Illness. Crit Care Explor 2020; 2: e0113


Dustin Rumpel, MD
Department of Intensive Care and Resuscitation, Anesthesiology Institute, Cleveland Clinic
Cleveland, Ohio
Piyush Mathur, MD, FCCM
Department of General Anesthesiology, Anesthesiology Institute, Cleveland Clinic
Cleveland, Ohio
Ashish K. Khanna MD, FCCP, FCCM
Wake Forest University School of Medicine, Center for Bioinformatics, and Center for Healthcare Innovation
Winston-Salem, North Carolina