We demonstrate the utility of concept lexicon expansion and evaluation using enriched samples of patients and documents with sexual orientation as a use case for rare event detection in electronic medical records. Using this approach, we found 7 additional words and 21 misspellings beyond our initial set of five seed words. We can use the expanded vocabulary to further develop a full natural language processing system to identify instances where sexual orientation is documented.
Keywords: Electronic Health Records; Natural Language Processing.