Objective: To create risk predictive models of healthcare-seeking delay among imported malaria patients in Jiangsu Province based on machine learning algorithms, so as to provide insights into early identification of imported malaria cases in Jiangsu Province.
Methods: Case investigation, first symptoms and time of initial diagnosis of imported malaria patients in Jiangsu Province in 2019 were captured from Infectious Disease Report Information Management System and Parasitic Disease Prevention and Control Information Management System of Chinese Center for Disease Control and Prevention. The risk predictive models of healthcare-seeking delay among imported malaria patients were created with the back propagation (BP) neural network model, logistic regression model, random forest model and Bayesian model using thirteen factors as independent variables, including occupation, species of malaria parasite, main clinical manifestations, presence of complications, severity of disease, age, duration of residing abroad, frequency of malaria parasite infections abroad, incubation period, level of institution at initial diagnosis, country of origin, number of individuals travelling with patients and way to go abroad, and time of healthcare-seeking delay as a dependent variable. Logistic regression model was visualized using a nomogram, and the nomogram was evaluated using calibration curves. In addition, the efficiency of the four models for prediction of risk of healthcare-seeking delay among imported malaria patients was evaluated using the area under curve (AUC) of receiver operating characteristic curve (ROC). The importance of each characteristic was quantified and attributed by using SHAP to examine the positive and negative effects of the value of each characteristic on the predictive efficiency.
Results: A total of 244 imported malaria patients were enrolled, including 100 cases (40.98%) with the duration from onset of first symptoms to time of initial diagnosis that exceeded 24 hours. Logistic regression analysis identified a history of malaria parasite infection [odds ratio (OR) = 3.075, 95% confidential interval (CI): (1.597, 5.923)], long incubation period [OR = 1.010, 95% CI: (1.001, 1.018)] and seeking healthcare in provincial or municipal medical facilities [OR = 12.550, 95% CI: (1.158, 135.963)] as risk factors for delay in seeking healthcare among imported malaria cases. BP neural network modeling showed that duration of residing abroad, incubation period and age posed great impacts on delay in healthcare-seek among imported malaria patients. Random forest modeling showed that the top five factors with the greatest impact on healthcare-seeking delay included main clinical manifestations, the way to go abroad, incubation period, duration of residing abroad and age among imported malaria patients, and Bayesian modeling revealed that the top five factors affecting healthcare-seeking delay among imported malaria patients included level of institutions at initial diagnosis, age, country of origin, history of malaria parasite infection and individuals travelling with imported malaria patients. ROC curve analysis showed higher overall performance of the BP neural network model and the logistic regression model for prediction of the risk of healthcare-seeking delay among imported malaria patients (Z = 2.700 to 4.641, all P values < 0.01), with no statistically significant difference in the AUC among four models (Z = 1.209, P > 0.05). The sensitivity (71.00%) and Youden index (43.92%) of the logistic regression model was higher than those of the BP neural network (63.00% and 36.61%, respectively), and the specificity of the BP neural network model (73.61%) was higher than that of the logistic regression model (72.92%).
Conclusions: Imported malaria cases with long duration of residing abroad, a history of malaria parasite infection, long incubation period, advanced age and seeking healthcare in provincial or municipal medical institutions have a high likelihood of delay in healthcare-seeking in Jiangsu Province. The models created based on the logistic regression and BP neural network show a high efficiency for prediction of the risk of healthcare-seeking among imported malaria patients in Jiangsu Province, which may provide insights into health management of imported malaria patients.
[摘要] 目的 基于机器学习算法构建江苏省输入性疟疾病例就医延迟风险预测模型, 为江苏省输入性疟疾病例早期 发现提供依据。方法 基于中国疾病预防控制中心传染病报告信息管理系统和寄生虫病防治信息管理系统, 收集2019 年江苏省报告的输入性疟疾病例个案调查、首发症状及初诊时间等信息。以职业、感染疟原虫虫种、主要临床表现、有无 并发症、疾病严重程度、年龄、国外居留时间、在国外感染疟疾次数、潜伏期、初诊单位级别、来源国、同行人员和出国途径 等13个因素为自变量, 以就医延迟时间 (≤ 24 h和> 24 h) 为因变量, 分别运用BP神经网络、logistic回归、随机森林和贝叶 斯算法构建输入性疟疾病例就医延迟风险预测模型。使用列线图对logistic回归进行可视化分析, 绘制校准曲线对列线 图进行评价, 并比较4种模型的受试者工作特征曲线 (receiver operator characteristic curve, ROC) 曲线下面积 (area under curve, AUC), 以评价模型预测效能。进一步分析各特征数值大小对预测结果的正负影响, 应用SHAP算法对各特征重要 性进行量化和归因。结果 共纳入输入性疟疾病例244例, 其中自出现首发症状后到初诊时间超过24 h的病例累计100 例 (40.98%)。建立logistic回归模型发现, 有疟疾感染史[比值比 (odds ratio, OR) = 3.075, 95%可信区间 (confidential interval, CI): (1.597, 5.923)]、潜伏期长[OR = 1.010, 95% CI: (1.001, 1.018)]或在省市级医疗机构就医[OR = 12.550, 95% CI: (1.158, 135.963)]是输入性疟疾病例就医延迟的危险因素。BP神经网络模型结果分析发现, 对输入性疟疾就医延迟影响 较大的因素是国外居留时间、潜伏期和年龄。随机森林模型结果分析发现, 影响输入性疟疾就医延迟的前5位因素依次 为主要临床表现、出国途径、潜伏期、国外居留时间和年龄。贝叶斯模型结果分析发现, 影响输入性疟疾就医延迟的前5 位因素依次为初诊单位级别、年龄、来源国、疟疾感染史和同行人员。通过比较各模型AUC发现, BP神经网络模型与logistic回归模型总体性能较优 (Z = 2.700 ~ 4.641, P 均< 0.01), 且AUC差异无统计学意义 (Z = 1.209, P > 0.05)。Logistic回 归模型预测灵敏度 (71.00%) 和约登指数 (43.92%) 均高于BP神经网络模型 (63.00%和36.61%); 而BP神经网络模型预测 特异度 (73.61%) 高于logistic回归模型 (72.92%)。结论 国外居留时间长、有疟疾感染史、潜伏期长、高年龄组和在省市 级医疗机构就诊的江苏省输入性疟疾病例发生就医延迟的概率较高。基于logistic回归模型和BP神经网络模型构建江 苏省输入性疟疾患者就医延迟风险预测模型具有较好预测效能, 可为输入性疟疾患者健康管理提供参考。.
Keywords: BP neural network model; Healthcare-seeking delay; Imported malaria; Jiangsu Province; Logistic regression model; Machine learning; Risk predictive model.