Introduction: Increased utilization of electronic health records (EHR) has enriched databases for creating risk models. We used machine learning techniques to develop an EHR-based risk model locally fitted to patients with type 2 diabetes mellitus (T2DM) for predicting cardiovascular disease.
Methods: This retrospective observational study was conducted within Ochsner Health, Louisiana, USA, between 2013-2017. Data analysis included 6245 patients who had two outpatient diagnoses of T2DM recorded on separate days or a diagnosis recorded during an inpatient encounter. Baseline clinical data were limited to 180 days before the index diagnosis. Cardiovascular outcomes were coronary heart disease (CHD), heart failure and stroke. Machine learning approaches were used to select predictor variables into Cox proportional hazards models for each outcome. Locally fit equations were compared to "generalized" risk equations (RECODe, AS-CVD, QRISK3) using model discrimination and calibration.
Results: Among factors identified in the Ochsner (n = 11), RECODe (n = 14), AS-CVD (n = 15) and QRISK3 (n = 23), only age was common to all four risk equations. The Ochsner model had high internal discrimination for CHD (C-statistics 0.85) and better discrimination than RECODe (C-statistics 0.45), the QRISK3 (C-statistics 0.72) and AS-CVD (C-statistics 0.54).
Conclusions: The Ochsner model overestimated 5-year CHD risk, but had relatively higher calibration than the other models in CHD. Risk equations fitted for local populations improved cardiovascular risk stratification for patients with T2DM. Application of machine learning simplified the models compared to "generalized" risk equations.
Keywords: Cardiovascular disease; Cerebrovascular stroke; Diabetes; Heart failure; Machine learning; Type 2 diabetes mellitus.