Classifying Future Healthcare Utilization in COPD Using Quantitative CT Lung Imaging and Two-Step Feature Selection via Sparse Subspace Learning with the CanCOLD Study

Acad Radiol. 2024 Oct;31(10):4221-4230. doi: 10.1016/j.acra.2024.03.030. Epub 2024 Apr 15.

Abstract

Rationale: Although numerous candidate features exist for predicting risk of higher risk of healthcare utilization in patients with chronic obstructive pulmonary disease (COPD), the process for selecting the most discriminative features remains unclear.

Objective: The objective of this study was to develop a robust feature selection method to identify the most discriminative candidate features for predicting healthcare utilization in COPD, and compare the model performance with other common feature selection methods.

Materials and methods: In this retrospective study, demographic, lung function measurements and CT images were collected from 454 COPD participants from the Canadian Cohort Obstructive Lung Disease study from 2010-2017. A follow-up visit was completed approximately 1.5 years later and participants reported healthcare utilization. CT analysis was performed for feature extraction. A two-step hybrid feature selection method was proposed that utilized: (1) sparse subspace learning with nonnegative matrix factorization, and, (2) genetic algorithm. Seven commonly used feature selection methods were also implemented that reported the top 10 or 20 features for comparison. Performance was evaluated using accuracy.

Results: Of the 454 COPD participants evaluated, 161 (35%) utilized healthcare services at follow-up. The accuracy for predicting subsequent healthcare utilization for the seven commonly used feature selection methods ranged from 72%-76% with the top 10 features, and 77%-80% with the top 20 features. Relative to these methods, hybrid feature selection obtained significantly higher accuracy for predicting subsequent healthcare utilization at 82% ± 3% (p < 0.05). Selected features with the proposed method included: DLCO, FEV1, RV, FVC, TAC, LAA950, Pi-10, LAA856, LAC total hole count, outer area RB1, wall area RB1, wall area and Jacobian.

Conclusion: The hybrid feature selection method identified the most discriminative features for classifying individuals with and without future healthcare utilization, and increased the accuracy compared to other state-of-the-art approaches.

MeSH terms

  • Aged
  • Algorithms
  • Canada
  • Female
  • Humans
  • Machine Learning
  • Male
  • Middle Aged
  • Patient Acceptance of Health Care* / statistics & numerical data
  • Pulmonary Disease, Chronic Obstructive* / diagnostic imaging
  • Retrospective Studies
  • Tomography, X-Ray Computed* / methods