Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages

Aymen Meddeb; Sophia Lüken; Felix Busch; Lisa Adams; Lorenzo Ugga; Emmanouil Koltsakis; Antonios Tzortzakakis; Soumaya Jelassi; Insaf Dkhil; Michail E Klontzas; Matthaios Triantafyllou; Burak Kocak; Sabahattin Yüzkan; Longjiang Zhang; Bin Hu; Anna Andreychenko; Efimtcev Alexander Yurievich; Tatiana Logunova; Wipawee Morakote; Salita Angkurawaranon; Marcus R Makowski; Mike P Wattjes; Renato Cuocolo; Keno Bressem

doi:10.1148/radiol.241736

Large Language Model Ability to Translate CT and MRI Free-Text Radiology Reports Into Multiple Languages

Radiology. 2024 Dec;313(3):e241736. doi: 10.1148/radiol.241736.

Authors

Aymen Meddeb¹, Sophia Lüken¹, Felix Busch¹, Lisa Adams¹, Lorenzo Ugga¹, Emmanouil Koltsakis¹, Antonios Tzortzakakis¹, Soumaya Jelassi¹, Insaf Dkhil¹, Michail E Klontzas¹, Matthaios Triantafyllou¹, Burak Kocak¹, Sabahattin Yüzkan¹, Longjiang Zhang¹, Bin Hu¹, Anna Andreychenko¹, Efimtcev Alexander Yurievich¹, Tatiana Logunova¹, Wipawee Morakote¹, Salita Angkurawaranon¹, Marcus R Makowski¹, Mike P Wattjes¹, Renato Cuocolo^#¹, Keno Bressem^#¹

Affiliation

¹ From the Departments of Neuroradiology (A.M., M.P.W.) and Radiology (S.L.), Charité-Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany; Department of Neuroradiology, Hôpital Maison-Blanche, CHU Reims, Université Reims-Champagne-Ardenne, 45 Rue Cognacq-Jay, 51092 Reims, France (A.M.); Berlin Institute of Health at Charité-Universitätsmedizin Berlin, Berlin, Germany (A.M.); School of Medicine and Health, Department of Diagnostic and Interventional Radiology, Klinikum rechts der Isar, TUM University Hospital, Technical University of Munich, Munich, Germany (F.B., L.A., M.R.M., K.B.); Department of Advanced Biomedical Sciences, University of Naples Federico II, Naples, Italy (L.U.); Department of Radiology, Karolinska University Hospital, Stockholm, Sweden (E.K.); Department for Clinical Science, Intervention and Technology (CLINTEC), Division of Radiology, Karolinska Institute, Stockholm, Sweden (A.T.); Department of Radiology, National Institute Mongi Ben Hamida of Neurology, Tunis, Tunisia (S.J., I.D.); Department of Radiology, School of Medicine, University of Crete, Heraklion, Greece (M.E.K., M.T.); Computational Biomedicine Laboratory, Institute of Computer Science, Foundation for Research and Technology (FORTH), Heraklion, Greece (M.E.K.); Department of Radiology, University of Health Sciences, Basaksehir Cam and Sakura City Hospital, Basaksehir, Istanbul, Turkey (B.K.); Department of Radiology, Koc University Hospital, Istanbul, Turkey (S.Y.); Department of Radiology, Jinling Hospital, Affiliated Hospital of Medical School of Nanjing University, Nanjing, China (L.Z., B.H.); Laboratory for Digital Public Health Technologies, ITMO University, St Petersburg, Russian Federation (A.A., E.A.Y., T.L.); Department of Radiology, Chiang Mai University, Chiang Mai, Thailand (W.M., S.A.); Department of Medicine, Surgery, and Dentistry, University of Salerno, Baronissi, Italy (R.C.); and School of Medicine and Health, Institute for Cardiovascular Radiology and Nuclear Medicine, German Heart Center Munich, TUM University Hospital, Technical University of Munich, Munich, Germany (K.B.).

^# Contributed equally.

PMID: 39688492
DOI: 10.1148/radiol.241736

Abstract

Background High-quality translations of radiology reports are essential for optimal patient care. Because of limited availability of human translators with medical expertise, large language models (LLMs) are a promising solution, but their ability to translate radiology reports remains largely unexplored. Purpose To evaluate the accuracy and quality of various LLMs in translating radiology reports across high-resource languages (English, Italian, French, German, and Chinese) and low-resource languages (Swedish, Turkish, Russian, Greek, and Thai). Materials and Methods A dataset of 100 synthetic free-text radiology reports from CT and MRI scans was translated by 18 radiologists between January 14 and May 2, 2024, into nine target languages. Ten LLMs, including GPT-4 (OpenAI), Llama 3 (Meta), and Mixtral models (Mistral AI), were used for automated translation. Translation accuracy and quality were assessed with use of BiLingual Evaluation Understudy (BLEU) score, translation error rate (TER), and CHaRacter-level F-score (chrF++) metrics. Statistical significance was evaluated with use of paired t tests with Holm-Bonferroni corrections. Radiologists also conducted a qualitative evaluation of translations with use of a standardized questionnaire. Results GPT-4 demonstrated the best overall translation quality, particularly from English to German (BLEU score: 35.0 ± 16.3 [SD]; TER: 61.7 ± 21.2; chrF++: 70.6 ± 9.4), to Greek (BLEU: 32.6 ± 10.1; TER: 52.4 ± 10.6; chrF++: 62.8 ± 6.4), to Thai (BLEU: 53.2 ± 7.3; TER: 74.3 ± 5.2; chrF++: 48.4 ± 6.6), and to Turkish (BLEU: 35.5 ± 6.6; TER: 52.7 ± 7.4; chrF++: 70.7 ± 3.7). GPT-3.5 showed highest accuracy in translations from English to French, and Qwen1.5 excelled in English-to-Chinese translations, whereas Mixtral 8x22B performed best in Italian-to-English translations. The qualitative evaluation revealed that LLMs excelled in clarity, readability, and consistency with the original meaning but showed moderate medical terminology accuracy. Conclusion LLMs showed high accuracy and quality for translating radiology reports, although results varied by model and language pair. © RSNA, 2024 Supplemental material is available for this article.

MeSH terms

Language
Magnetic Resonance Imaging
Natural Language Processing*
Radiology* / methods
Radiology* / standards
Research Report* / standards
Tomography, X-Ray Computed
Translating*