Objective: Our goal was to evaluate the efficacy of OpenAI's ChatGPT-4.0 large language model (LLM) in translating technical ophthalmology terminology into more comprehensible language for allied health care professionals and compare it with other LLMs.
Design: Observational cross-sectional study.
Participants: Five ophthalmologists each contributed three clinical encounter notes, totaling 15 reports for analysis.
Methods: Notes were translated into more comprehensible language using ChatGPT-4.0, ChatGPT-4o, Claude 3 Sonnet, and Google Gemini. Ten family physicians, masked to whether the note was original or translated by an LLM, independently evaluated both sets using Likert scales to assess comprehension and utility for clinical decision-making. Readability was evaluated using Flesch Reading Ease and Flesch-Kincaid Grade Level scores. Five ophthalmologist raters compared performance between LLMs and identified translation errors.
Results: LLM translations significantly outperformed the original notes in terms of comprehension (mean score of 4.7/5.0 vs 3.7/5.0; p < 0.001) and perceived usefulness (mean score of 4.6/5.0 vs 3.8/5.0; p < 0.005). Readability analysis demonstrated mildly increased linguistic complexity in the translated notes. ChatGPT-4.0 was preferred in 8 of 15 cases, ChatGPT-4o in 4, Gemini in 3, and Claude 3 Sonnet in 0 cases. All models exhibited some translation errors, but ChatGPT-4o and ChatGPT-4.0 had fewer inaccuracies.
Conclusions: ChatGPT-4.0 can significantly enhance the comprehensibility of ophthalmic notes, facilitating better interprofessional communication and suggesting a promising role for LLMs in medical translation. However, the results also underscore the need for ongoing refinement and careful implementation of such technologies. Further research is needed to validate these findings across a broader range of specialties and languages.
Copyright © 2024 The Authors. Published by Elsevier Inc. All rights reserved.