Application of a data continuity prediction algorithm to an electronic health record-based pharmacoepidemiology study

James H Flory; Yongkang Zhang; Samprit Banerjee; Fei Wang; Jea Y Min; Alvin I Mushlin

doi:10.1111/jep.14002

Application of a data continuity prediction algorithm to an electronic health record-based pharmacoepidemiology study

J Eval Clin Pract. 2024 Jun;30(4):716-725. doi: 10.1111/jep.14002. Epub 2024 May 2.

Authors

James H Flory¹, Yongkang Zhang², Samprit Banerjee², Fei Wang², Jea Y Min², Alvin I Mushlin²

Affiliations

¹ Endocrinology Service, Department of Subspecialty Medicine, Memorial Sloan Kettering Cancer Center, New York City, New York, USA.
² Department of Population Health Sciences, Weill Cornell Medical College, New York City, New York, USA.

PMID: 38696462
DOI: 10.1111/jep.14002

Abstract

Background and objectives: Use of algorithms to identify patients with high data-continuity in electronic health records (EHRs) may increase study validity. Practical experience with this approach remains limited.

Methods: We developed and validated four algorithms to identify patients with high data continuity in an EHR-based data source. Selected algorithms were then applied to a pharmacoepidemiologic study comparing rates of COVID-19 hospitalization in patients exposed to insulin versus noninsulin antidiabetic drugs.

Results: A model using a short list of five EHR-derived variables performed as well as more complex models to distinguish high- from low-data continuity patients. Higher data continuity was associated with more accurate ascertainment of key variables. In the pharmacoepidemiologic study, patients with higher data continuity had higher observed rates of the COVID-19 outcome and a large unadjusted association between insulin use and the outcome, but no association after propensity score adjustment.

Discussion: We found that a simple, portable algorithm to predict data continuity gave comparable performance to more complex methods. Use of the algorithm significantly impacted the results of an empirical study, with evidence of more valid results at higher levels of data continuity.

Keywords: COVID‐19; antidiabetic drugs; data continuity; diabetes; electronic health record; missing data.

MeSH terms

Adult
Aged
Algorithms*
COVID-19 / epidemiology
Electronic Health Records* / statistics & numerical data
Female
Hospitalization / statistics & numerical data
Humans
Hypoglycemic Agents* / therapeutic use
Insulin / administration & dosage
Insulin / therapeutic use
Male
Middle Aged
Pharmacoepidemiology* / methods
SARS-CoV-2

Substances

Hypoglycemic Agents
Insulin

Grants and funding

PCORI/Patient-Centered Outcomes Research Institute/United States