TransformEHR: transformer-based encoder-decoder generative model to enhance prediction of disease outcomes using electronic health records

Nat Commun. 2023 Nov 29;14(1):7857. doi: 10.1038/s41467-023-43715-z.

Abstract

Deep learning transformer-based models using longitudinal electronic health records (EHRs) have shown a great success in prediction of clinical diseases or outcomes. Pretraining on a large dataset can help such models map the input space better and boost their performance on relevant tasks through finetuning with limited data. In this study, we present TransformEHR, a generative encoder-decoder model with transformer that is pretrained using a new pretraining objective-predicting all diseases and outcomes of a patient at a future visit from previous visits. TransformEHR's encoder-decoder framework, paired with the novel pretraining objective, helps it achieve the new state-of-the-art performance on multiple clinical prediction tasks. Comparing with the previous model, TransformEHR improves area under the precision-recall curve by 2% (p < 0.001) for pancreatic cancer onset and by 24% (p = 0.007) for intentional self-harm in patients with post-traumatic stress disorder. The high performance in predicting intentional self-harm shows the potential of TransformEHR in building effective clinical intervention systems. TransformEHR is also generalizable and can be easily finetuned for clinical prediction tasks with limited data.

MeSH terms

  • Electric Power Supplies
  • Electronic Health Records
  • Humans
  • Mental Recall
  • Pancreatic Neoplasms*
  • Stress Disorders, Post-Traumatic*