Simple models vs. deep learning in detecting low ejection fraction from the electrocardiogram

John Weston Hughes; Sulaiman Somani; Pierre Elias; James Tooley; Albert J Rogers; Timothy Poterucha; Christopher M Haggerty; Michael Salerno; David Ouyang; Euan Ashley; James Zou; Marco V Perez

doi:10.1093/ehjdh/ztae034

Simple models vs. deep learning in detecting low ejection fraction from the electrocardiogram

Eur Heart J Digit Health. 2024 Apr 25;5(4):427-434. doi: 10.1093/ehjdh/ztae034. eCollection 2024 Jul.

Affiliations

¹ Department of Computer Science, Stanford University, 353 Jane Stanford Way, Stanford, CA 94305, USA.
² Department of Medicine, Stanford University, 1265 Pasteur Dr, Stanford, CA 94305, USA.
³ Department of Medicine, Columbia University Irving Medical Center, 622 W 168th St, New York, NY 10032, USA.
⁴ Cedars-Sinai Medical Center, Department of Cardiology, Smidt Heart Institute, 127 S San Vicente Blvd Pavilion, Suite A3600, Los Angeles, CA 90048, USA.
⁵ Department of Biomedical Data Science, Stanford University, 1265 Welch Road, Stanford, CA 94305, USA.

Abstract

Aims: Deep learning methods have recently gained success in detecting left ventricular systolic dysfunction (LVSD) from electrocardiogram (ECG) waveforms. Despite their high level of accuracy, they are difficult to interpret and deploy broadly in the clinical setting. In this study, we set out to determine whether simpler models based on standard ECG measurements could detect LVSD with similar accuracy to that of deep learning models.

Methods and results: Using an observational data set of 40 994 matched 12-lead ECGs and transthoracic echocardiograms, we trained a range of models with increasing complexity to detect LVSD based on ECG waveforms and derived measurements. The training data were acquired from the Stanford University Medical Center. External validation data were acquired from the Columbia Medical Center and the UK Biobank. The Stanford data set consisted of 40 994 matched ECGs and echocardiograms, of which 9.72% had LVSD. A random forest model using 555 discrete, automated measurements achieved an area under the receiver operator characteristic curve (AUC) of 0.92 (0.91-0.93), similar to a deep learning waveform model with an AUC of 0.94 (0.93-0.94). A logistic regression model based on five measurements achieved high performance [AUC of 0.86 (0.85-0.87)], close to a deep learning model and better than N-terminal prohormone brain natriuretic peptide (NT-proBNP). Finally, we found that simpler models were more portable across sites, with experiments at two independent, external sites.

Conclusion: Our study demonstrates the value of simple electrocardiographic models that perform nearly as well as deep learning models, while being much easier to implement and interpret.

Keywords: Artificial intelligence; Deep learning; Electrocardiograms; Explainability; Interpretability.

Simple models vs. deep learning in detecting low ejection fraction from the electrocardiogram

Authors

Affiliations

Abstract

Grants and funding