Aims: Accurate measures of hypoglycemia within electronic health records (EHR) can facilitate clinical population management and research. We quantify the occurrence of serious and mild-to-moderate hypoglycemia in a large EHR database in the US, comparing estimates based only on structured data to those from structured data and natural language processing (NLP) of clinical notes.
Methods: This cohort study included patients with type 2 diabetes identified from January 2009 through March 2014. We compared estimates of occurrence of hypoglycemia derived from diagnostic codes to those recorded within clinical notes and classified via NLP. Measures of hypoglycemia from only structured data (ICD-9 Algorithm), only note mentions (NLP Algorithm), and either structured data or notes (Combined Algorithm) were compared with estimates of the period prevalence, incidence rate, and event rate of hypoglycemia, overall and by seriousness.
Results: Of the 844,683 eligible patients, 119,695 had at least one recorded hypoglycemic event identified with ICD-9 or NLP. The period prevalence of hypoglycemia was 12.4%, 25.1%, and 32.2% for the ICD-9 Algorithm, NLP Algorithm, and Combined Algorithm, respectively. There were 6128 apparent non-serious events utilizing the ICD-9 Algorithm, which increased to 152,987 non-serious events within the Combined Algorithm.
Conclusions: Ascertainment of events from clinical notes more than doubled the completeness of hypoglycemia capture overall relative to measures from structured data, and increased capture of non-serious events more than 20-fold. The structured data and clinical notes are complementary within the EHR, and both need to be considered in order to fully assess the occurrence of hypoglycemia.
Keywords: Electronic health records; Hypoglycemia; Mild hypoglycemia; Natural language processing; Severe hypoglycemia; Type 2 diabetes.
Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.