Purpose: The purpose of this study was to develop a natural language processing algorithm to identify suicidal ideation/attempt from free-text clinical notes.
Methods: Clinical notes containing prespecified keywords related to suicidal ideation/attempts from 2010 to 2018 were extracted from our organization's electronic health record system. A random sample of 864 clinical notes was selected and equally divided into 4 subsets. These subsets were reviewed and classified as 1 of the following 3 suicidal ideation/attempt categories (current, historical, and no) by experienced research chart abstractors. The first 3 data sets were used to develop the rule-based computerized algorithm sequentially and the fourth data set was used to evaluate the algorithm's performance. The validated algorithm was then applied to the entire study sample of clinical notes.
Results: The computerized algorithm correctly identified 23 of the 26 confirmed current suicidal ideation/attempts and all 10 confirmed historical suicidal ideation/attempts in the validation data set. It produced an 88.5% sensitivity and a 100.0% positive predictive value for current suicidal ideation/attempts, and a 100.0% sensitivity and positive predictive value for historical suicidal ideation/attempts. After applying the computerized algorithm to the entire set of study notes, we identified a total of 1,050,287 current ideation/attempt events and 293,037 historical ideation/attempt events documented in clinical notes. Those for which current ideation/attempt events were documented were more likely to be female (59.5%), 25-44 years old (28.3%), and White (43.4%).
Conclusion: Our study demonstrated that a computerized algorithm can effectively identify suicidal ideation/attempts from clinical notes. This algorithm can be utilized in support of suicide prevention research programs and patient care quality improvement initiatives.