Background: Patients with transient ischemic attack (TIA) face a significantly increased risk of stroke. However, TIA screening and early detection rates are low, especially in developing countries. This study aims to develop an inclusive and practical TIA risk prediction model using machine learning (ML) that performs well in both hospital and resource-limited clinic settings. This model is essential for initiating the first ML-enabled learning health system (LHS) unit designed for routine and equitable TIA screening and early detection across broad populations.
Methods: Employing a novel protocol, this study first standardized data from a hospital's electronic medical records (EMR) to construct inclusive TIA risk prediction ML models using a data-centric approach. Subsequently, a quantitative distribution of TIA risk factors was applied in feature engineering to reduce the number of variables for a practical ML model. This refined model initiated a TIA ML-LHS unit that is capable of continuously updating with new EMR data from hospitals and clinics. Additionally, the practical model underwent external validation using data from another hospital.
Results: The inclusive 150-variable ML models, derived from all available EMR variables for TIA, achieved a recall of 0.868 and an accuracy of 0.886 in predicting TIA risk. Further feature engineering produced a practical XGBoost model with 20 variables, maintaining acceptable performance of 0.855 recall and 0.796 accuracy. The initialized TIA ML-LHS unit, based on the practical model, achieved performance metrics of 0.830 recall, 0.726 precision, 0.816 ROC-AUC, and 0.812 accuracy. The model also performed well in external validation, confirming its effectiveness with patient data from different clinical settings.
Conclusions: This study developed the first inclusive and practical TIA XGBoost model from full hospital EHR and initiated the first TIA risk prediction ML-LHS unit. This TIA model, which requires only 20 variables, enables the ML-LHS to serve not only patients in hospitals but also those in resource-limited clinics. These results have significant implications for expanding risk-based TIA screening in community and rural clinics, thereby enhancing early detection of TIA among underserved populations and improving health equity. The novel protocol used in this study is also applicable for initiating ML-LHS units for various preventable diseases, providing a new system-level approach to responsible AI development and applications.
Keywords: Early detection; Electronic medical records; Learning health system; Machine learning; Responsible AI; Risk prediction; Screening; Transient ischemic attack.
© 2024. The Author(s).