Background and objective: Diagnostic uncertainty, when unrecognized or poorly communicated, can result in diagnostic error. However, diagnostic uncertainty is challenging to study due to a lack of validated identification methods. This study aims to identify distinct linguistic patterns associated with diagnostic uncertainty in clinical documentation.
Design, setting and participants: This case-control study compares the clinical documentation of hospitalized children who received a novel uncertain diagnosis (UD) diagnosis label during their admission to a set of matched controls. Linguistic analyses identified potential linguistic indicators (i.e., words or phrases) of diagnostic uncertainty that were then manually reviewed by a linguist and clinical experts to identify those most relevant to diagnostic uncertainty. A natural language processing program categorized medical terminology into semantic types (i.e., sign or symptom), from which we identified a subset of these semantic types that both categorized reliably and were relevant to diagnostic uncertainty. Finally, a competitive machine learning modeling strategy utilizing the linguistic indicators and semantic types compared different predictive models for identifying diagnostic uncertainty.
Results: Our cohort included 242 UD-labeled patients and 932 matched controls with a combination of 3070 clinical notes. The best-performing model was a random forest, utilizing a combination of linguistic indicators and semantic types, yielding a sensitivity of 89.4% and a positive predictive value of 96.7%.
Conclusion: Expert labeling, natural language processing, and machine learning methods combined with human validation resulted in highly predictive models to detect diagnostic uncertainty in clinical documentation and represent a promising approach to detecting, studying, and ultimately mitigating diagnostic uncertainty in clinical practice.
© 2023 Society of Hospital Medicine.