This research evaluates the readability and quality of patient information material about female urinary incontinence (fUI) in ten popular artificial intelligence (AI) supported chatbots. We used the most recent versions of 10 widely-used chatbots, including OpenAI's GPT-4, Claude-3 Sonnet, Grok 1.5, Mistral Large 2, Google Palm 2, Meta's Llama 3, HuggingChat v0.8.4, Microsoft's Copilot, Gemini Advanced, and Perplexity. Prompts were created to generate texts about UI, stress type UI, urge type UI, and mix type UI. The modified Ensuring Quality Information for Patients (EQIP) technique and QUEST (Quality Evaluating Scoring Tool) were used to assess the quality, and the average of 8 well-known readability formulas, which is Average Reading Level Consensus (ARLC), were used to evaluate readability. When comparing the average scores, there were significant differences in the mean mQEIP and QUEST scores across ten chatbots (p = 0.049 and p = 0.018). Gemini received the greatest mean scores for mEQIP and QUEST, whereas Grok had the lowest values. The chatbots exhibited significant differences in mean ARLC, word count, and sentence count (p = 0.047, p = 0.001, and p = 0.001, respectively). For readability, Grok is the easiest to read, while Mistral is highly complex to understand. AI-supported chatbot technology needs to be improved in terms of readability and quality of patient information regarding female UI.
Keywords: Artificial Intelligence; Claude; Copilot; Female Urinary Incontinence; GPT-4; Gemini; Google Palm; Grok; Huggingchat; Llama; Mistral; Perplexity.
© 2024. The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.