Assessing Laterality Errors in Radiology: Comparing Generative Artificial Intelligence and Natural Language Processing

Anjaneya Singh Kathait; Emiliano Garza-Frias; Tejash Sikka; Thomas J Schultz; Bernardo Bizzo; Mannudeep K Kalra; Keith J Dreyer

doi:10.1016/j.jacr.2024.06.014

Assessing Laterality Errors in Radiology: Comparing Generative Artificial Intelligence and Natural Language Processing

J Am Coll Radiol. 2024 Oct;21(10):1575-1582. doi: 10.1016/j.jacr.2024.06.014. Epub 2024 Jul 1.

Authors

Anjaneya Singh Kathait¹, Emiliano Garza-Frias², Tejash Sikka³, Thomas J Schultz⁴, Bernardo Bizzo⁵, Mannudeep K Kalra⁶, Keith J Dreyer⁷

Affiliations

¹ Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts. Electronic address: akathait@mgh.harvard.edu.
² Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts; Mass General Brigham AI, Boston, Massachusetts.
³ Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
⁴ Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts; Senior Director, Enterprise Medical Imaging, Mass General Brigham AI, Boston, Massachusetts.
⁵ Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts; Mass General Brigham AI, Boston, Massachusetts; ACR DSI (Data Science Institute) Senior Scientist and the Senior Director, Digital Clinical Research Organization.
⁶ Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts; Scientific Director, Mass General Brigham AI, Boston, Massachusetts.
⁷ Research Fellow, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts; Chief Data Science Officer, Mass General Brigham AI, Boston, Massachusetts; ACR DSI Chief Science Officer; Chief Imaging Information, Mass General Brigham; Vice Chairman of Radiology-Informatics, Massachusetts General Hospital and Brigham and Women's Hospital; and Co-Chair, Mass General Brigham AI Imaging AI Governance Committee.

PMID: 38960083
DOI: 10.1016/j.jacr.2024.06.014

Abstract

Purpose: We compared the performance of generative artificial intelligence (AI) (Augmented Transformer Assisted Radiology Intelligence [ATARI, Microsoft Nuance, Microsoft Corporation, Redmond, Washington]) and natural language processing (NLP) tools for identifying laterality errors in radiology reports and images.

Methods: We used an NLP-based (mPower, Microsoft Nuance) tool to identify radiology reports flagged for laterality errors in its Quality Assurance Dashboard. The NLP model detects and highlights laterality mismatches in radiology reports. From an initial pool of 1,124 radiology reports flagged by the NLP for laterality errors, we selected and evaluated 898 reports that encompassed radiography, CT, MRI, and ultrasound modalities to ensure comprehensive coverage. A radiologist reviewed each radiology report to assess if the flagged laterality errors were present (reporting error-true-positive) or absent (NLP error-false-positive). Next, we applied ATARI to 237 radiology reports and images with consecutive NLP true-positive (118 reports) and false-positive (119 reports) laterality errors. We estimated accuracy of NLP and generative AI tools to identify overall and modality-wise laterality errors.

Results: Among the 898 NLP-flagged laterality errors, 64% (574 of 898) had NLP errors and 36% (324 of 898) were reporting errors. The text query ATARI feature correctly identified the absence of laterality mismatch (NLP false-positives) with a 97.4% accuracy (115 of 118 reports; 95% confidence interval [CI] = 96.5%-98.3%). Combined vision and text query resulted in 98.3% accuracy (116 of 118 reports or images; 95% CI = 97.6%-99.0%), and query alone had a 98.3% accuracy (116 of 118 images; 95% CI = 97.6%-99.0%).

Conclusion: The generative AI-empowered ATARI prototype outperformed the assessed NLP tool for determining true and false laterality errors in radiology reports while enabling an image-based laterality determination. Underlying errors in ATARI text query in complex radiology reports emphasize the need for further improvement in the technology.

Keywords: generative AI; large language models; natural language processing; patient safety; radiology errors.

Publication types

Comparative Study

MeSH terms

Artificial Intelligence*
Diagnostic Errors
Diagnostic Imaging
Humans
Natural Language Processing*
Radiology Information Systems