Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes

AMIA Annu Symp Proc. 2009 Nov 14:2009:416-20.

Abstract

Accessing both structured and unstructured clinical data is a high priority for research efforts. However, HIPAA requires that data meet or exceed a deidentification standard to assure that protected health information (PHI) is removed. This is a particularly difficult problem in the case of unstructured clinical free text and natural language processing (NLP) systems can be trained to automatically de-identify clinical text. Moreover, manual human annotation of clinical note documents for the purpose of building reference standards to evaluate NLP systems is a costly and time consuming process. Annotation schema must be created that can be used to build reliable and valid reference standards to evaluate NLP systems for the deidentification task. We describe the inductive creation of an annotation schema and subsequent reference standard. We also provide estimates of the accuracy of human annotators for this particular task.

MeSH terms

  • Confidentiality / standards*
  • Health Insurance Portability and Accountability Act
  • Hospitals, Veterans
  • Humans
  • Medical Records Systems, Computerized*
  • Natural Language Processing*
  • United States