Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries

Arthur Jochems; Timo M Deist; Issam El Naqa; Marc Kessler; Chuck Mayo; Jackson Reeves; Shruti Jolly; Martha Matuszak; Randall Ten Haken; Johan van Soest; Cary Oberije; Corinne Faivre-Finn; Gareth Price; Dirk de Ruysscher; Philippe Lambin; Andre Dekker

doi:10.1016/j.ijrobp.2017.04.021

Developing and Validating a Survival Prediction Model for NSCLC Patients Through Distributed Learning Across 3 Countries

Int J Radiat Oncol Biol Phys. 2017 Oct 1;99(2):344-352. doi: 10.1016/j.ijrobp.2017.04.021. Epub 2017 Apr 24.

Authors

Affiliations

¹ Department of Radiation Oncology (MAASTRO), GROW-School for Oncology and Developmental Biology, Maastricht University Medical Centre, Maastricht, The Netherlands. Electronic address: arthur.jochems@maastro.nl.
² Department of Radiation Oncology (MAASTRO), GROW-School for Oncology and Developmental Biology, Maastricht University Medical Centre, Maastricht, The Netherlands.
³ Department of Radiation Oncology, University of Michigan, Ann Arbor, Michigan.
⁴ The University of Manchester, Manchester Academic Health Science Centre, The Christie NHS Foundation Trust, Manchester, UK.

Abstract

Purpose: Tools for survival prediction for non-small cell lung cancer (NSCLC) patients treated with chemoradiation or radiation therapy are of limited quality. In this work, we developed a predictive model of survival at 2 years. The model is based on a large volume of historical patient data and serves as a proof of concept to demonstrate the distributed learning approach.

Methods and materials: Clinical data from 698 lung cancer patients, treated with curative intent with chemoradiation or radiation therapy alone, were collected and stored at 2 different cancer institutes (559 patients at Maastro clinic (Netherlands) and 139 at Michigan university [United States]). The model was further validated on 196 patients originating from The Christie (United Kingdon). A Bayesian network model was adapted for distributed learning (the animation can be viewed at https://www.youtube.com/watch?v=ZDJFOxpwqEA). Two-year posttreatment survival was chosen as the endpoint. The Maastro clinic cohort data are publicly available at https://www.cancerdata.org/publication/developing-and-validating-survival-prediction-model-nsclc-patients-through-distributed, and the developed models can be found at www.predictcancer.org.

Results: Variables included in the final model were T and N category, age, performance status, and total tumor dose. The model has an area under the curve (AUC) of 0.66 on the external validation set and an AUC of 0.62 on a 5-fold cross validation. A model based on the T and N category performed with an AUC of 0.47 on the validation set, significantly worse than our model (P<.001). Learning the model in a centralized or distributed fashion yields a minor difference on the probabilities of the conditional probability tables (0.6%); the discriminative performance of the models on the validation set is similar (P=.26).

Conclusions: Distributed learning from federated databases allows learning of predictive models on data originating from multiple institutions while avoiding many of the data-sharing barriers. We believe that distributed learning is the future of sharing data in health care.

Publication types

Multicenter Study
Validation Study
Video-Audio Media

MeSH terms

Age Factors
Aged
Antineoplastic Combined Chemotherapy Protocols / therapeutic use
Area Under Curve
Bayes Theorem
Carcinoma, Non-Small-Cell Lung / mortality*
Carcinoma, Non-Small-Cell Lung / therapy*
Chemoradiotherapy / mortality
Cohort Studies
Databases, Factual / statistics & numerical data
Female
Forecasting / methods
Humans
Kaplan-Meier Estimate
Learning*
Lung Neoplasms / mortality*
Lung Neoplasms / therapy*
Lymph Nodes / pathology
Male
Models, Statistical
Neoplasm Staging / standards
Radiotherapy, Conformal / mortality
Severity of Illness Index
Time Factors

Abstract

Publication types

MeSH terms

Grants and funding