Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence

Yong Chen; Jianqiao Wang; Jessica Chubak; Rebecca A Hubbard

doi:10.1002/pds.4680

Inflation of type I error rates due to differential misclassification in EHR-derived outcomes: Empirical illustration using breast cancer recurrence

Pharmacoepidemiol Drug Saf. 2019 Feb;28(2):264-268. doi: 10.1002/pds.4680. Epub 2018 Oct 30.

Authors

Yong Chen¹, Jianqiao Wang¹, Jessica Chubak², Rebecca A Hubbard¹

Affiliations

¹ Department of Biostatistics, Epidemiology, and Informatics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA.
² Kaiser Permanente Washington Health Research Institute, Kaiser Permanente Washington, Seattle, WA, USA.

Abstract

Purpose: Many outcomes derived from electronic health records (EHR) not only are imperfect but also may suffer from exposure-dependent differential misclassification due to variability in the quality and availability of EHR data across exposure groups. The objective of this study was to quantify the inflation of type I error rates that can result from differential outcome misclassification.

Methods: We used data on gold-standard and EHR-derived second breast cancers in a cohort of women with a prior breast cancer diagnosis from 1993 to 2006 enrolled in Kaiser Permanente Washington. We simulated an exposure that was independent of the true outcome status. A surrogate outcome was then simulated with varying sensitivity and specificity according to exposure status. We estimated the type I error rate for a test of association relating this exposure to the surrogate outcome, while varying outcome sensitivity and specificity in exposed individuals.

Results: Type I error rates were substantially inflated above the nominal level (5%) for even modest departures from nondifferential misclassification. Holding sensitivity in exposed and unexposed groups at 85%, a difference in specificity of 10% between the exposed and unexposed (80% vs 90%) resulted in a 36% type I error rate. Type I error was inflated more by differential specificity than sensitivity.

Conclusions: Differential outcome misclassification may induce spurious findings. Researchers using EHR-derived outcomes should use misclassification-adjusted methods whenever possible or conduct sensitivity analyses to investigate the possibility of false-positive findings, especially for exposures that may be related to the accuracy of outcome ascertainment.

Keywords: electronic health record; misclassification; outcome; pharmacoepidemiology; phenotype; validation.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Aged
Bias
Breast Neoplasms / epidemiology*
Cohort Studies
Computer Simulation
Data Accuracy
Data Interpretation, Statistical
Electronic Health Records / statistics & numerical data*
Female
Humans
Middle Aged
Models, Statistical
Neoplasm Recurrence, Local / epidemiology*
Outcome Assessment, Health Care / methods
Outcome Assessment, Health Care / statistics & numerical data*
Pharmacoepidemiology / methods
Pharmacoepidemiology / statistics & numerical data*
Sensitivity and Specificity
Washington / epidemiology

Abstract

Publication types

MeSH terms

Grants and funding