Unlocking the future of patient Education: ChatGPT vs. LexiComp® as sources of patient education materials

J Am Pharm Assoc (2003). 2024 May 8:102119. doi: 10.1016/j.japh.2024.102119. Online ahead of print.

Abstract

Background: ChatGPT is a conversational artificial intelligence technology that has shown application in various facets of healthcare. With the increased use of AI, it is imperative to assess the accuracy and comprehensibility of AI platforms.

Objective: This pilot project aimed to assess the understandability, readability, and accuracy of ChatGPT as a source of medication-related patient education as compared with an evidence-based medicine tertiary reference resource, LexiComp®.

Methods: Patient education materials (PEMs) were obtained from ChatGPT and LexiComp® for 8 common medications (albuterol, apixaban, atorvastatin, hydrocodone/acetaminophen, insulin glargine, levofloxacin, omeprazole, and sacubitril/valsartan). PEMs were extracted, blinded, and assessed by 2 investigators independently. The primary outcome was a comparison of the Patient Education Materials Assessment Tool-printable (PEMAT-P). Secondary outcomes included Flesch reading ease, Flesch Kincaid grade level, percent passive sentences, word count, and accuracy. A 7-item accuracy checklist for each medication was generated by expert consensus among pharmacist investigators, with LexiComp® PEMs serving as the control. PEMAT-P interrater reliability was determined via intraclass correlation coefficient (ICC). Flesch reading ease, Flesch Kincaid grade level, percent passive sentences, and word count were calculated by Microsoft® Word®. Continuous data were assessed using the Student's t-test via SPSS (version 20.0).

Results: No difference was found in the PEMAT-P understandability score of PEMs produced by ChatGPT versus LexiComp® [77.9% (11.0) vs. 72.5% (2.4), P=0.193]. Reading level was higher with ChatGPT [8.6 (1.2) vs. 5.6 (0.3), P < 0.001). ChatGPT PEMs had a lower percentage of passive sentences and lower word count. The average accuracy score of ChatGPT PEMs was 4.25/7 (61%), with scores ranging from 29% to 86%.

Conclusion: Despite comparable PEMAT-P scores, ChatGPT PEMs did not meet grade level targets. Lower word count and passive text with ChatGPT PEMs could benefit patients, but the variable accuracy scores prevent routine use of ChatGPT to produce medication-related PEMs at this time.