Evaluating the accuracy of lung-RADS score extraction from radiology reports: Manual entry versus natural language processing

Amir Gandomi; Eusha Hasan; Jesse Chusid; Subroto Paul; Matthew Inra; Alex Makhnevich; Suhail Raoof; Gerard Silvestri; Brett C Bade; Stuart L Cohen

doi:10.1016/j.ijmedinf.2024.105580

Evaluating the accuracy of lung-RADS score extraction from radiology reports: Manual entry versus natural language processing

Int J Med Inform. 2024 Nov:191:105580. doi: 10.1016/j.ijmedinf.2024.105580. Epub 2024 Jul 31.

Authors

Amir Gandomi¹, Eusha Hasan², Jesse Chusid³, Subroto Paul⁴, Matthew Inra⁴, Alex Makhnevich⁵, Suhail Raoof⁶, Gerard Silvestri⁷, Brett C Bade⁸, Stuart L Cohen⁵

Affiliations

¹ Northwell, New Hyde Park, NY, USA; Institute of Health System Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA; Frank G. Zarb School of Business, Hofstra University, Hempstead, NY, USA.
² Northwell, New Hyde Park, NY, USA; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA.
³ Northwell, New Hyde Park, NY, USA; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA; North Shore University Hospital, Northwell, Manhasset, NY, USA.
⁴ Northwell, New Hyde Park, NY, USA; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA; Lenox Hill Hospital, Northwell, New York, NY, USA.
⁵ Northwell, New Hyde Park, NY, USA; Institute of Health System Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA; North Shore University Hospital, Northwell, Manhasset, NY, USA.
⁶ Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA; North Shore University Hospital, Northwell, Manhasset, NY, USA; Lenox Hill Hospital, Northwell, New York, NY, USA.
⁷ Medical University of South Carolina, Charleston, SC, USA.
⁸ Northwell, New Hyde Park, NY, USA; Institute of Health System Science, Feinstein Institutes for Medical Research, Manhasset, NY, USA; Donald and Barbara Zucker School of Medicine at Hofstra/Northwell, Hempstead, NY, USA; Lenox Hill Hospital, Northwell, New York, NY, USA. Electronic address: bbade@northwell.edu.

PMID: 39096594
DOI: 10.1016/j.ijmedinf.2024.105580

Abstract

Introduction: Radiology scoring systems are critical to the success of lung cancer screening (LCS) programs, impacting patient care, adherence to follow-up, data management and reporting, and program evaluation. LungCT ScreeningReporting and Data System (Lung-RADS) is a structured radiology scoring system that provides recommendations for LCS follow-up that are utilized (a) in clinical care and (b) by LCS programs monitoring rates of adherence to follow-up. Thus, accurate reporting and reliable collection of Lung-RADS scores are fundamental components of LCS program evaluation and improvement. Unfortunately, due to variability in radiology reports, extraction of Lung-RADS scores is non-trivial, and best practices do not exist. The purpose of this project is to compare mechanisms to extract Lung-RADS scores from free-text radiology reports.

Methods: We retrospectively analyzed reports of LCS low-dose computed tomography (LDCT) examinations performed at a multihospital integrated healthcare network in New York State between January 2016 and July 2023. We compared three methods of Lung-RADS score extraction: manual physician entry at time of report creation, manual LCS specialist entry after report creation, and an internally developed, rule-based natural language processing (NLP) algorithm. Accuracy, recall, precision, and completeness (i.e., the proportion of LCS exams to which a Lung-RADS score has been assigned) were compared between the three methods.

Results: The dataset includes 24,060 LCS examinations on 14,243 unique patients. The mean patient age was 65 years, and most patients were male (54 %) and white (75 %). Completeness rate was 65 %, 68 %, and 99 % for radiologists' manual entry, LCS specialists' entry, and NLP algorithm, respectively. Accuracy, recall, and precision were high across all extraction methods (>94 %), though the NLP-based approach was consistently higher than both manual entries in all metrics.

Discussion: An NLP-based method of LCS score determination is an efficient and more accurate means of extracting Lung-RADS scores than manual review and data entry. NLP-based methods should be considered best practice for extracting structured Lung-RADS scores from free-text radiology reports.

Keywords: Follow-up; LC screening; Lung-RADS score; Manual entry; Natural language processing.

Publication types

Comparative Study

MeSH terms

Aged
Early Detection of Cancer
Female
Humans
Lung Neoplasms* / diagnostic imaging
Male
Natural Language Processing*
Radiology Information Systems / standards
Retrospective Studies
Tomography, X-Ray Computed*