Development and validation of a machine learning-based diagnostic model for Parkinson's disease in community-dwelling populations: Evidence from the China health and retirement longitudinal study (CHARLS)

Parkinsonism Relat Disord. 2025 Jan:130:107182. doi: 10.1016/j.parkreldis.2024.107182. Epub 2024 Oct 30.

Abstract

Background: Parkinson's disease (PD) is a major neurodegenerative disorder in Middle-aged and elderly people.There is a pressing need for effective predictive models, particularly in chinese population.

Objective: This study aims to develop and validate a machine learning-based diagnostic model to identify individuals with PD in community-dwelling populations using data from the China Health and Retirement Longitudinal Study (CHARLS).

Methods: We utilized data from 19,134 individuals aged 45 and above from the CHARLS dataset, with 265 adults reported to have PD. The external validation cohort included 1500 individuals, with 21 (1.4 %) having PD.The random forest (RF) algorithm was used to develop an interpretable PD prediction model, which was internally validated using 10-fold cross-validation and externally validated with a dataset from Northern Jiangsu People's Hospital. SHapley Additive exPlanation (SHAP) values were employed to elucidate the model's predictions.

Results: The RF model demonstrated robust performance with an Area Under the Curve (AUC) of 0.884 and high sensitivity, specificity, and F1 scores. The model's performance in external validation cohort, highlighting an AUC of 0.82 and an accuracy of 0.99. The model's performance remained consistent across internal and external validation cohorts. SHAP analysis provided insights into the importance and interaction of various predictors, enhancing model interpretability.

Conclusion: The study presents a highly accurate and interpretable machine learning-based diagnostic model to identify individuals with PD in middle-aged and older Chinese adults. By combined with predictive risk factors and chronic disease information, the model offers valuable insights for early identification and intervention, potentially mitigating PD progression.

Keywords: CHARLS; Lifestyle factors; Machine learning; Parkinson's disease; Predictive model; SHAP analysis.

Publication types

  • Validation Study

MeSH terms

  • Aged
  • Aged, 80 and over
  • China
  • Female
  • Humans
  • Independent Living*
  • Longitudinal Studies
  • Machine Learning*
  • Male
  • Middle Aged
  • Parkinson Disease* / diagnosis