Objective: To study the relation between electronic health record (EHR) variables and healthcare process events.
Materials and methods: Lagged linear correlation was calculated between five healthcare process events and 84 EHR variables (24 clinical laboratory values and 60 clinical concepts extracted from clinical notes) in a 24-year database. The EHR variables were clustered for each healthcare process event and interpreted.
Results: Laboratory tests tended to cluster together and note concepts tended to cluster together. Within each of those two classes, the variables clustered into clinically sensible groupings. The exact groupings varied from healthcare process event to event, with the largest differences occurring between inpatient events and outpatient events.
Discussion: Unlike previously reported pairwise associations between variables, which highlighted correlations across the laboratory-clinical note divide, incorporating healthcare process events appeared to be sensitive to the manner in which the variables were collected.
Conclusion: We believe that it may be possible to exploit this sensitivity to help knowledge engineers select variables and correct for biases.
Keywords: Data Mining; Electronic Health Record; Phenotype.