A machine learning ensemble approach for 5- and 10-year breast cancer invasive disease event classification

Raffaella Massafra; Maria Colomba Comes; Samantha Bove; Vittorio Didonna; Sergio Diotaiuti; Francesco Giotta; Agnese Latorre; Daniele La Forgia; Annalisa Nardone; Domenico Pomarico; Cosmo Maurizio Ressa; Alessandro Rizzo; Pasquale Tamborra; Alfredo Zito; Vito Lorusso; Annarita Fanizzi

doi:10.1371/journal.pone.0274691

A machine learning ensemble approach for 5- and 10-year breast cancer invasive disease event classification

PLoS One. 2022 Sep 19;17(9):e0274691. doi: 10.1371/journal.pone.0274691. eCollection 2022.

Authors

Raffaella Massafra¹, Maria Colomba Comes¹, Samantha Bove¹, Vittorio Didonna¹, Sergio Diotaiuti¹, Francesco Giotta¹, Agnese Latorre¹, Daniele La Forgia¹, Annalisa Nardone¹, Domenico Pomarico^{2

3}, Cosmo Maurizio Ressa¹, Alessandro Rizzo¹, Pasquale Tamborra¹, Alfredo Zito¹, Vito Lorusso¹, Annarita Fanizzi¹

Affiliations

¹ I.R.C.C.S. Istituto Tumori "Giovanni Paolo II", Bari, Italy.
² Dipartimento di Fisica and MECENAS, Università di Bari, Bari, Italy.
³ INFN, Sezione di Bari, Bari, Italy.

Abstract

Designing targeted treatments for breast cancer patients after primary tumor removal is necessary to prevent the occurrence of invasive disease events (IDEs), such as recurrence, metastasis, contralateral and second tumors, over time. However, due to the molecular heterogeneity of this disease, predicting the outcome and efficacy of the adjuvant therapy is challenging. A novel ensemble machine learning classification approach was developed to address the task of producing prognostic predictions of the occurrence of breast cancer IDEs at both 5- and 10-years. The method is based on the concept of voting among multiple models to give a final prediction for each individual patient. Promising results were achieved on a cohort of 529 patients, whose data, related to primary breast cancer, were provided by Istituto Tumori "Giovanni Paolo II" in Bari, Italy. Our proposal greatly improves the performances returned by the baseline original model, i.e., without voting, finally reaching a median AUC value of 77.1% and 76.3% for the IDE prediction at 5-and 10-years, respectively. Finally, the proposed approach allows to promote more intelligible decisions and then a greater acceptability in clinical practice since it returns an explanation of the IDE prediction for each individual patient through the voting procedure.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Breast Neoplasms* / pathology
Combined Modality Therapy
Female
Humans
Italy
Machine Learning

Grants and funding

This work was supported by funding from the Italian Ministry of Health, Ricerca Finalizzata 2018 deliberation n.812/2020.