A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study

Jayashree Kalpathy-Cramer; Binsheng Zhao; Dmitry Goldgof; Yuhua Gu; Xingwei Wang; Hao Yang; Yongqiang Tan; Robert Gillies; Sandy Napel

doi:10.1007/s10278-016-9859-z

A Comparison of Lung Nodule Segmentation Algorithms: Methods and Results from a Multi-institutional Study

J Digit Imaging. 2016 Aug;29(4):476-87. doi: 10.1007/s10278-016-9859-z.

Authors

Jayashree Kalpathy-Cramer¹, Binsheng Zhao², Dmitry Goldgof³, Yuhua Gu⁴, Xingwei Wang⁵, Hao Yang², Yongqiang Tan², Robert Gillies⁴, Sandy Napel⁶

Affiliations

¹ Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA.
² Department of Radiology, Columbia University Medical Center, New York, NY, USA.
³ Department of Computer Science and Engineering, University of South Florida, Tampa, FL, USA.
⁴ Departments of Cancer Imaging and Metabolism, H. Lee Moffitt Cancer Center and Research Institute, Tampa, FL, USA.
⁵ Department of Radiology, Stanford University School of Medicine, James H. Clark Center S323 318 Campus Drive, Stanford, CA, 94305-5450, USA.
⁶ Department of Radiology, Stanford University School of Medicine, James H. Clark Center S323 318 Campus Drive, Stanford, CA, 94305-5450, USA. snapel@stanford.edu.

Abstract

Tumor volume estimation, as well as accurate and reproducible borders segmentation in medical images, are important in the diagnosis, staging, and assessment of response to cancer therapy. The goal of this study was to demonstrate the feasibility of a multi-institutional effort to assess the repeatability and reproducibility of nodule borders and volume estimate bias of computerized segmentation algorithms in CT images of lung cancer, and to provide results from such a study. The dataset used for this evaluation consisted of 52 tumors in 41 CT volumes (40 patient datasets and 1 dataset containing scans of 12 phantom nodules of known volume) from five collections available in The Cancer Imaging Archive. Three academic institutions developing lung nodule segmentation algorithms submitted results for three repeat runs for each of the nodules. We compared the performance of lung nodule segmentation algorithms by assessing several measurements of spatial overlap and volume measurement. Nodule sizes varied from 29 μl to 66 ml and demonstrated a diversity of shapes. Agreement in spatial overlap of segmentations was significantly higher for multiple runs of the same algorithm than between segmentations generated by different algorithms (p < 0.05) and was significantly higher on the phantom dataset compared to the other datasets (p < 0.05). Algorithms differed significantly in the bias of the measured volumes of the phantom nodules (p < 0.05) underscoring the need for assessing performance on clinical data in addition to phantoms. Algorithms that most accurately estimated nodule volumes were not the most repeatable, emphasizing the need to evaluate both their accuracy and precision. There were considerable differences between algorithms, especially in a subset of heterogeneous nodules, underscoring the recommendation that the same software be used at all time points in longitudinal studies.

Keywords: Computed tomography; Infrastructure; Lung cancer; Quantitative imaging; Segmentation.

Publication types

Comparative Study
Multicenter Study

MeSH terms

Algorithms*
Humans
Lung Neoplasms / diagnostic imaging*
Lung Neoplasms / pathology
Phantoms, Imaging
Reproducibility of Results
Solitary Pulmonary Nodule / diagnostic imaging*
Solitary Pulmonary Nodule / pathology
Tomography, X-Ray Computed*
Tumor Burden

Abstract

Publication types

MeSH terms

Grants and funding