Machine learning-assisted construction of COPD self-evaluation questionnaire (COPD-EQ): a national multicentre study in China

J Glob Health. 2025 Jan 3:15:04052. doi: 10.7189/jogh.15.04052.

Abstract

Background: Approximately 70% of chronic obstructive pulmonary disease (COPD) is underdiagnosed worldwide. We aimed to develop and validate a COPD self-evaluation questionnaire (COPD-EQ) that is better suited for COPD screening in China.

Methods: We developed a primary version of COPD-EQ based on the Delphi method. Then, we conducted a nationwide multicentre prospective to validate our novel COPD-EQ screening ability. To improve the screening ability of COPD-EQ, we used a series of machine learning (ML)-based methods, including logistic regression, XgBoost, LightGBM, and CatBoost. These models were developed and then evaluated on a random 3:1 train/test split.

Results: Through the Delphi approach, we developed the primary version of COPD-EQ with nine items. In the following prospective multicentre study, we recruited 1824 outpatients from 12 sites, of whom 404 (22.1%) were diagnosed with COPD. After the score assignment assisted by ML models and the Shapley Additive Explanation method, six of nine items were retained for a briefer version of COPD-EQ. The scoring-based method achieves an AUC score of 0.734 at a threshold of 4.0. Finally, a novel six-item COPD-EQ questionnaire was developed.

Conclusions: The COPD-EQ questionnaire was validated to be reliable and accurate in COPD screening for the Chinese population. The ML model can further improve the questionnaire's screening ability.

Publication types

  • Multicenter Study

MeSH terms

  • Aged
  • China / epidemiology
  • Delphi Technique
  • Diagnostic Self Evaluation
  • Female
  • Humans
  • Machine Learning*
  • Male
  • Mass Screening / methods
  • Middle Aged
  • Prospective Studies
  • Pulmonary Disease, Chronic Obstructive* / diagnosis
  • Reproducibility of Results
  • Surveys and Questionnaires