Naïve Electronic Health Record phenotype identification for Rheumatoid arthritis

AMIA Annu Symp Proc. 2011:2011:189-96. Epub 2011 Oct 22.

Abstract

Electronic Health Records (EHRs) provide a real-world patient cohort for clinical and genomic research. Phenotype identification using informatics algorithms has been shown to replicate known genetic associations found in clinical trials and observational cohorts. However, development of accurate phenotype identification methods can be challenging, requiring significant time and effort. We applied Support Vector Machines (SVMs) to both naïve (i.e., non-curated) and expert-defined collections of EHR features to identify Rheumatoid Arthritis cases using billing codes, medication exposures, and natural language processing-derived concepts. SVMs trained on naïve and expert-defined data outperformed an existing deterministic algorithm; the best performing naïve system had precision of 0.94 and recall of 0.87, compared to precision of 0.75 and recall of 0.51 for the deterministic algorithm. We show that with an expert defined feature set as few as 50-100 training samples are required. This study demonstrates that SVMs operating on non-curated sets of attributes can accurately identify cases from an EHR.

Publication types

  • Evaluation Study
  • Research Support, N.I.H., Extramural

MeSH terms

  • Arthritis, Rheumatoid / diagnosis*
  • Electronic Health Records*
  • Humans
  • Information Storage and Retrieval / methods*
  • Phenotype
  • Support Vector Machine*