Objective: Large-language models (LLMs) in healthcare have the potential to propagate existing biases or introduce new ones. For people with epilepsy, social determinants of health are associated with disparities in access to care, but their impact on seizure outcomes among those with access to specialty care remains unclear. Here we (1) evaluated our validated, epilepsy-specific LLM for intrinsic bias, and (2) used LLM-extracted seizure outcomes to test the hypothesis that different demographic groups have different seizure outcomes.
Methods: First, we tested our LLM for intrinsic bias in the form of differential performance in demographic groups by race, ethnicity, sex, income, and health insurance in manually annotated notes. Next, we used LLM-classified seizure freedom at each office visit to test for outcome disparities in the same demographic groups, using univariable and multivariable analyses.
Results: We analyzed 84,675 clinic visits from 25,612 patients seen at our epilepsy center 2005-2022. We found no differences in the accuracy, or positive or negative class balance of outcome classifications across demographic groups. Multivariable analysis indicated worse seizure outcomes for female patients (OR 1.33, p = 3×10-8), those with public insurance (OR 1.53, p = 2×10-13), and those from lower-income zip codes (OR ≥ 1.22, p ≤ 6.6×10-3). Black patients had worse outcomes than White patients in univariable but not multivariable analysis (OR 1.03, p = 0.66).
Significance: We found no evidence that our LLM was intrinsically biased against any demographic group. Seizure freedom extracted by LLM revealed disparities in seizure outcomes across several demographic groups. These findings highlight the critical need to reduce disparities in the care of people with epilepsy.
Keywords: Clinical Informatics; Electronic Health Record; Health Disparities; Natural Language Processing.