Using machine learning to predict radiation pneumonitis in patients with stage I non-small cell lung cancer treated with stereotactic body radiation therapy

Phys Med Biol. 2016 Aug 21;61(16):6105-20. doi: 10.1088/0031-9155/61/16/6105. Epub 2016 Jul 27.

Abstract

To develop a patient-specific 'big data' clinical decision tool to predict pneumonitis in stage I non-small cell lung cancer (NSCLC) patients after stereotactic body radiation therapy (SBRT). 61 features were recorded for 201 consecutive patients with stage I NSCLC treated with SBRT, in whom 8 (4.0%) developed radiation pneumonitis. Pneumonitis thresholds were found for each feature individually using decision stumps. The performance of three different algorithms (Decision Trees, Random Forests, RUSBoost) was evaluated. Learning curves were developed and the training error analyzed and compared to the testing error in order to evaluate the factors needed to obtain a cross-validated error smaller than 0.1. These included the addition of new features, increasing the complexity of the algorithm and enlarging the sample size and number of events. In the univariate analysis, the most important feature selected was the diffusion capacity of the lung for carbon monoxide (DLCO adj%). On multivariate analysis, the three most important features selected were the dose to 15 cc of the heart, dose to 4 cc of the trachea or bronchus, and race. Higher accuracy could be achieved if the RUSBoost algorithm was used with regularization. To predict radiation pneumonitis within an error smaller than 10%, we estimate that a sample size of 800 patients is required. Clinically relevant thresholds that put patients at risk of developing radiation pneumonitis were determined in a cohort of 201 stage I NSCLC patients treated with SBRT. The consistency of these thresholds can provide radiation oncologists with an estimate of their reliability and may inform treatment planning and patient counseling. The accuracy of the classification is limited by the number of patients in the study and not by the features gathered or the complexity of the algorithm.

MeSH terms

  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Carcinoma, Non-Small-Cell Lung / surgery*
  • Female
  • Humans
  • Lung Neoplasms / surgery*
  • Machine Learning*
  • Male
  • Middle Aged
  • Multivariate Analysis
  • Radiation Pneumonitis / diagnosis*
  • Radiation Pneumonitis / etiology
  • Radiosurgery / adverse effects*
  • Reproducibility of Results