Objective: Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with future suicide events. These are often captured in narrative clinical notes in electronic health records (EHRs). Collaboratively, Weill Cornell Medicine (WCM), Northwestern Medicine (NM), and the University of Florida (UF) developed and validated deep learning (DL)-based natural language processing (NLP) tools to detect PSH and FSH from such notes. The tool's performance was further benchmarked against a method relying exclusively on ICD-9/10 diagnosis codes.
Materials and methods: We developed DL-based NLP tools utilizing pre-trained transformer models Bio_ClinicalBERT and GatorTron, and compared them with expert-informed, rule-based methods. The tools were initially developed and validated using manually annotated clinical notes at WCM. Their portability and performance were further evaluated using clinical notes at NM and UF.
Results: The DL tools outperformed the rule-based NLP tool in identifying PSH and FHS. For detecting PSH, the rule-based system obtained an F1-score of 0.75 ± 0.07, while the Bio_ClinicalBERT and GatorTron DL tools scored 0.83 ± 0.09 and 0.84 ± 0.07, respectively. For detecting FSH, the rule-based NLP tool's F1-score was 0.69 ± 0.11, compared to 0.89 ± 0.10 for Bio_ClinicalBERT and 0.92 ± 0.07 for GatorTron. For the gold standard corpora across the three sites, only 2.2% (WCM), 9.3% (NM), and 7.8% (UF) of patients reported to have an ICD-9/10 diagnosis code for suicidal thoughts and behaviors prior to the clinical notes report date. The best performing GatorTron DL tool identified 93.0% (WCM), 80.4% (NM), and 89.0% (UF) of patients with documented PSH, and 85.0%(WCM), 89.5%(NM), and 100%(UF) of patients with documented FSH in their notes.
Discussion: While PSH and FSH are significant risk factors for future suicide events, little effort has been made previously to identify individuals with these history. To address this, we developed a transformer based DL method and compared with conventional rule-based NLP approach. The varying effectiveness of the rule-based tools across sites suggests a need for improvement in its dictionary-based approach. In contrast, the performances of the DL tools were higher and comparable across sites. Furthermore, DL tools were fine-tuned using only small number of annotated notes at each site, underscores its greater adaptability to local documentation practices and lexical variations.
Conclusion: Variations in local documentation practices across health care systems pose challenges to rule-based NLP tools. In contrast, the developed DL tools can effectively extract PSH and FSH information from unstructured clinical notes. These tools will provide clinicians with crucial information for assessing and treating patients at elevated risk for suicide who are rarely been diagnosed.