Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes

Jeanmarie Mayer; Shuying Shen; Brett R South; Stephane Meystre; F Jeff Friedlin; William R Ray; Matthew Samore

Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes

AMIA Annu Symp Proc. 2009 Nov 14:2009:416-20.

Authors

Jeanmarie Mayer¹, Shuying Shen, Brett R South, Stephane Meystre, F Jeff Friedlin, William R Ray, Matthew Samore

Affiliation

¹ IDEAS Center SLCVA Healthcare System, Salt Lake City, Utah, USA.

PMID: 20351891
PMCID: PMC2815367

Abstract

Accessing both structured and unstructured clinical data is a high priority for research efforts. However, HIPAA requires that data meet or exceed a deidentification standard to assure that protected health information (PHI) is removed. This is a particularly difficult problem in the case of unstructured clinical free text and natural language processing (NLP) systems can be trained to automatically de-identify clinical text. Moreover, manual human annotation of clinical note documents for the purpose of building reference standards to evaluate NLP systems is a costly and time consuming process. Annotation schema must be created that can be used to build reliable and valid reference standards to evaluate NLP systems for the deidentification task. We describe the inductive creation of an annotation schema and subsequent reference standard. We also provide estimates of the accuracy of human annotators for this particular task.

MeSH terms

Confidentiality / standards*
Health Insurance Portability and Accountability Act
Hospitals, Veterans
Humans
Medical Records Systems, Computerized*
Natural Language Processing*
United States