Using natural language processing for identification of herpes zoster ophthalmicus cases to support population-based study

Clin Exp Ophthalmol. 2019 Jan;47(1):7-14. doi: 10.1111/ceo.13340. Epub 2018 Jul 4.

Abstract

Importance: Diagnosis codes are inadequate for accurately identifying herpes zoster (HZ) ophthalmicus (HZO). There is significant lack of population-based studies on HZO due to the high expense of manual review of medical records.

Background: To assess whether HZO can be identified from the clinical notes using natural language processing (NLP). To investigate the epidemiology of HZO among HZ population based on the developed approach.

Design: A retrospective cohort analysis.

Participants: A total of 49 914 southern California residents aged over 18 years, who had a new diagnosis of HZ.

Methods: An NLP-based algorithm was developed and validated with the manually curated validation data set (n = 461). The algorithm was applied on over 1 million clinical notes associated with the study population. HZO versus non-HZO cases were compared by age, sex, race and co-morbidities.

Main outcome measures: We measured the accuracy of NLP algorithm.

Results: NLP algorithm achieved 95.6% sensitivity and 99.3% specificity. Compared to the diagnosis codes, NLP identified significant more HZO cases among HZ population (13.9% vs. 1.7%). Compared to the non-HZO group, the HZO group was older, had more males, had more Whites and had more outpatient visits.

Conclusions and relevance: We developed and validated an automatic method to identify HZO cases with high accuracy. As one of the largest studies on HZO, our finding emphasizes the importance of preventing HZ in the elderly population. This method can be a valuable tool to support population-based studies and clinical care of HZO in the era of big data.

Keywords: HZO; diagnosis code; electronic medical record; epidemiology; natural language processing.

Publication types

  • Multicenter Study

MeSH terms

  • Adolescent
  • Adult
  • Aged
  • Aged, 80 and over
  • Algorithms*
  • Eye Infections, Viral / diagnosis*
  • Eye Infections, Viral / virology
  • Female
  • Follow-Up Studies
  • Herpes Zoster Ophthalmicus / diagnosis*
  • Herpes Zoster Ophthalmicus / virology
  • Herpesvirus 3, Human*
  • Humans
  • Male
  • Middle Aged
  • Natural Language Processing*
  • Population Surveillance / methods*
  • ROC Curve
  • Reproducibility of Results
  • Retrospective Studies
  • Young Adult