Post-concussion syndrome (PCS) is characterized by persistent cognitive, somatic, and emotional symptoms after a mild traumatic brain injury (mTBI). Genetic and other biological variables may contribute to PCS etiology, and the emergence of biobanks linked to electronic health records (EHRs) offers new opportunities for research on PCS. We sought to validate the EHR data of PCS patients by comparing two diagnostic algorithms deployed in the Vanderbilt University Medical Center de-identified database of 2.8 million patient EHRs. The algorithms identified individuals with PCS by: 1) natural language processing (NLP) of narrative text in the EHR combined with structured demographic, diagnostic, and encounter data; or 2) coded billing and procedure data. The predictive value of each algorithm was assessed, and cases and controls identified by each approach were compared on demographic and medical characteristics. The NLP algorithm identified 507 cases and 10,857 controls. The negative predictive value in controls was 78% and the positive predictive value (PPV) in cases was 82%. Conversely, the coded algorithm identified 1142 patients with two or more PCS billing codes and had a PPV of 76%. Comparisons of PCS controls to both case groups recovered known epidemiology of PCS: cases were more likely than controls to be female and to have pre-morbid diagnoses of anxiety, migraine, and post-traumatic stress disorder. In contrast, controls and cases were equally likely to have attention deficit hyperactive disorder and learning disabilities, in accordance with the findings of recent systematic reviews of PCS risk factors. We conclude that EHRs are a valuable research tool for PCS. Ascertainment based on coded data alone had a predictive value comparable to an NLP algorithm, recovered known PCS risk factors, and maximized the number of included patients.
Keywords: diagnostic algorithm; electronic health records; post-concussion syndrome.