Artificial Intelligence for Diagnosis in Otologic Patients: Is It Ready to Be Your Doctor?

Camryn Marshall; Jessica Forbes; Michael D Seidman; Luis Roldan; James Atkins

doi:10.1097/MAO.0000000000004267

Artificial Intelligence for Diagnosis in Otologic Patients: Is It Ready to Be Your Doctor?

Otol Neurotol. 2024 Sep 1;45(8):863-869. doi: 10.1097/MAO.0000000000004267.

Authors

Camryn Marshall¹, Jessica Forbes¹, Michael D Seidman, Luis Roldan², James Atkins³

Affiliations

¹ Charles E. Schmidt College of Medicine at Florida Atlantic University, Boca Raton, Florida.
² Advent Health Orlando, Orlando, Florida.
³ Neurotology Advent Health Celebration, Celebration, Florida.

PMID: 39142308
DOI: 10.1097/MAO.0000000000004267

Abstract

Objective: Investigate the precision of language-model artificial intelligence (AI) in diagnosing conditions by contrasting its predictions with diagnoses made by board-certified otologic/neurotologic surgeons using patient-described symptoms.

Study design: Prospective cohort study.

Setting: Tertiary care center.

Patients: One hundred adults participated in the study. These included new patients or established patients returning with new symptoms. Individuals were excluded if they could not provide a written description of their symptoms.

Interventions: Summaries of the patient's symptoms were supplied to three publicly available AI platforms: Chat GPT 4.0, Google Bard, and WebMD "Symptom Checker."

Main outcome measures: This study evaluates the accuracy of three distinct AI platforms in diagnosing otologic conditions by comparing AI results with the diagnosis determined by a neurotologist with the same information provided to the AI platforms and again after a complete history and physical examination.

Results: The study includes 100 patients (52 men and 48 women; average age of 59.2 yr). Fleiss' kappa between AI and the physician is -0.103 (p < 0.01). The chi-squared test between AI and the physician is χ2 = 12.95 (df = 2; p < 0.001). Fleiss' kappa between AI models is 0.409. Diagnostic accuracies are 22.45, 12.24, and 5.10% for ChatGPT 4.0, Google Bard, and WebMD, respectively.

Conclusions: Contemporary language-model AI platforms can generate extensive differential diagnoses with limited data input. However, doctors can refine these diagnoses through focused history-taking, physical examinations, and clinical experience-skills that current AI platforms lack.

MeSH terms

Adult
Aged
Aged, 80 and over
Artificial Intelligence*
Ear Diseases / diagnosis
Female
Humans
Male
Middle Aged
Prospective Studies