Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o

Roberto Maroncelli; Veronica Rizzo; Marcella Pasculli; Federica Cicciarelli; Massimo Macera; Francesca Galati; Carlo Catalano; Federica Pediconi

doi:10.1186/s41747-024-00526-1

Probing clarity: AI-generated simplified breast imaging reports for enhanced patient comprehension powered by ChatGPT-4o

Eur Radiol Exp. 2024 Oct 30;8(1):124. doi: 10.1186/s41747-024-00526-1.

Authors

Roberto Maroncelli¹, Veronica Rizzo², Marcella Pasculli², Federica Cicciarelli², Massimo Macera³, Francesca Galati², Carlo Catalano², Federica Pediconi²

Affiliations

¹ Department of Radiological, Oncological and Pathological Sciences, Sapienza-University of Rome, Rome, Roma, Italy. roberto.maroncelli@uniroma1.it.
² Department of Radiological, Oncological and Pathological Sciences, Sapienza-University of Rome, Rome, Roma, Italy.
³ Federico II-University of Naples, Naples, Italy.

Abstract

Background: To assess the reliability and comprehensibility of breast radiology reports simplified by artificial intelligence using the large language model (LLM) ChatGPT-4o.

Methods: A radiologist with 20 years' experience selected 21 anonymized breast radiology reports, 7 mammography, 7 breast ultrasound, and 7 breast magnetic resonance imaging (MRI), categorized according to breast imaging reporting and data system (BI-RADS). These reports underwent simplification by prompting ChatGPT-4o with "Explain this medical report to a patient using simple language". Five breast radiologists assessed the quality of these simplified reports for factual accuracy, completeness, and potential harm with a 5-point Likert scale from 1 (strongly agree) to 5 (strongly disagree). Another breast radiologist evaluated the text comprehension of five non-healthcare personnel readers using a 5-point Likert scale from 1 (excellent) to 5 (poor). Descriptive statistics, Cronbach's α, and the Kruskal-Wallis test were used.

Results: Mammography, ultrasound, and MRI showed high factual accuracy (median 2) and completeness (median 2) across radiologists, with low potential harm scores (median 5); no significant group differences (p ≥ 0.780), and high internal consistency (α > 0.80) were observed. Non-healthcare readers showed high comprehension (median 2 for mammography and MRI and 1 for ultrasound); no significant group differences across modalities (p = 0.368), and high internal consistency (α > 0.85) were observed. BI-RADS 0, 1, and 2 reports were accurately explained, while BI-RADS 3-6 reports were challenging.

Conclusion: The model demonstrated reliability and clarity, offering promise for patients with diverse backgrounds. LLMs like ChatGPT-4o could simplify breast radiology reports, aid in communication, and enhance patient care.

Relevance statement: Simplified breast radiology reports generated by ChatGPT-4o show potential in enhancing communication with patients, improving comprehension across varying educational backgrounds, and contributing to patient-centered care in radiology practice.

Key points: AI simplifies complex breast imaging reports, enhancing patient understanding. Simplified reports from AI maintain accuracy, improving patient comprehension significantly. Implementing AI reports enhances patient engagement and communication in breast imaging.

Keywords: Artificial intelligence; Breast radiology; Large language models; Natural language processing; Patient-centered care.

MeSH terms

Artificial Intelligence*
Breast / diagnostic imaging
Breast Neoplasms / diagnostic imaging
Comprehension*
Female
Humans
Magnetic Resonance Imaging* / methods
Mammography* / methods
Radiology Information Systems
Reproducibility of Results
Ultrasonography, Mammary / methods