Background: The accurate classification of lung nodules is critical to achieving personalized lung cancer treatment and prognosis prediction. The treatment options for lung cancer and the prognosis of patients are closely related to the type of lung nodules, but there are many types of lung nodules, and the distinctions between certain types are subtle, making accurate classification based on traditional medical imaging technology and doctor experience challenging. This study adopts a novel approach, using computed tomography (CT) radiomics to analyze the quantitative features in CT images to reveal the characteristics of lung nodules, and then employs diversity-weighted ensemble learning to enhance the accuracy of classification by integrating the predictive results of multiple models.
Methods: We extracted lung nodules from the Lung Image Database Consortium image collection (LIDC-IDRI) dataset and derived radiomics features from the nodules. For the classification tasks of seven types of lung nodules, each was split into binary classifications. Two model-building methods were employed: M1 (equal-weighted voting ensemble classifier) and M2 (diversity-weighted voting ensemble classifier). Models were evaluated using 10-fold cross-validation with metrics including the area under the receiver operating characteristic curve (AUC), accuracy, specificity, and sensitivity.
Results: Both methods effectively completed classification tasks. The M2 method outperformed M1, particularly in classifying texture, calcification, and the benign and malignant nature of lung nodules. The AUC values of the M2 method in the four subclassifications of texture types of lung nodules were 0.9913, 0.8838, 0.9525, and 0.8845, with the corresponding accuracies of 0.9651, 0.8116, 0.9000, and 0.8284, respectively. In the classification of the degree of calcification of lung nodules, the AUC value of the M2 method was 0.9775 with an accuracy of 0.9642. In the classification of the benign and malignant nature of lung nodules, the AUC value of the M2 method was 0.8953 with an accuracy of 0.8168. The combination of CT radiomics and diversity-weighted ensemble learning effectively identifies lung nodule types, providing a novel method for the precise classification of lung nodules and aiding personalized lung cancer treatment and prognosis prediction.
Conclusions: The combination of CT radiomics and ensemble learning for diversity weighting can be well realized to identify the type of lung nodules.
Keywords: Radiomics; ensemble classifier; lung nodule classification; machine learning; medical images.
2024 AME Publishing Company. All rights reserved.