Objective: To identify stable and discriminating radiomic features on non-contrast CT scans to develop more generalisable radiomic classifiers for distinguishing granulomas from adenocarcinomas.
Methods: In total, 412 patients with adenocarcinomas and granulomas from three institutions were retrospectively included. Segmentations of the lung nodules were performed manually by an expert radiologist in a 2D axial view. Radiomic features were extracted from intra- and perinodular regions. A total of 145 patients were used as part of the training set (Str), whereas 205 patients were used as part of test set I (Ste1) and 62 patients were used as part of independent test set II (Ste2). To mitigate the variation of CT acquisition parameters, we defined 'stable' radiomic features as those for which the feature expression remains relatively unchanged between different sites, as assessed using a Wilcoxon rank-sum test. These stable features were used to develop more generalisable radiomic classifiers that were more resilient to variations in lung CT scans. Features were ranked based on two criteria, firstly based on discriminability (i.e. maximising AUC) alone and subsequently based on maximising both feature stability and discriminability. Different machine-learning classifiers (Linear discriminant analysis, Quadratic discriminant analysis, Support vector machines and random forest) were trained with features selected using the two different criteria and then compared on the two independent test sets for distinguishing granulomas from adenocarcinomas, in terms of area under the receiver operating characteristic curve.
Results: In the test sets, classifiers constructed using the criteria involving maximising feature stability and discriminability simultaneously achieved higher AUC compared with the discriminating alone criteria (Ste1 [n = 205]: maximum AUCs of 0.85versus . 0.80; p-value = 0.047 and Ste2 [n = 62]: maximum AUCs of 0.87 versus. 0.79; p-value = 0.021). These differences held for features extracted from scans with <3 mm slice thickness (AUC = 0.88 versus. 0.80; p-value = 0.039, n = 100) and for the ≥3 mm cases (AUC = 0.81 versus. 0.76; p-value = 0.034, n = 105). In both experiments, shape and peritumoural texture features had a higher stability compared with intratumoural texture features.
Conclusions: Our study suggests that explicitly accounting for both stability and discriminability results in more generalisable radiomic classifiers to distinguish adenocarcinomas from granulomas on non-contrast CT scans. Our results also showed that peritumoural texture and shape features were less affected by the scanner parameters compared with intratumoural texture features; however, they were also less discriminating compared with intratumoural features.
Keywords: Lung cancer; Machine learning; Malignant nodule; NSCLC; Radiomics; Stability.
Copyright © 2021 Elsevier Ltd. All rights reserved.