Background: The aim of this study was to develop a new data-mining model to predict axillary lymph node (AxLN) metastasis in primary breast cancer. To achieve this, we used a decision tree-based prediction method-the alternating decision tree (ADTree).
Methods: Clinical datasets for primary breast cancer patients who underwent sentinel lymph node biopsy or AxLN dissection without prior treatment were collected from three institutes (institute A, n = 148; institute B, n = 143; institute C, n = 174) and were used for variable selection, model training and external validation, respectively. The models were evaluated using area under the receiver operating characteristics (ROC) curve analysis to discriminate node-positive patients from node-negative patients.
Results: The ADTree model selected 15 of 24 clinicopathological variables in the variable selection dataset. The resulting area under the ROC curve values were 0.770 [95% confidence interval (CI), 0.689-0.850] for the model training dataset and 0.772 (95% CI: 0.689-0.856) for the validation dataset, demonstrating high accuracy and generalization ability of the model. The bootstrap value of the validation dataset was 0.768 (95% CI: 0.763-0.774).
Conclusions: Our prediction model showed high accuracy for predicting nodal metastasis in patients with breast cancer using commonly recorded clinical variables. Therefore, our model might help oncologists in the decision-making process for primary breast cancer patients before starting treatment.