Targeted Development and Validation of Clinical Prediction Models in Secondary Care Settings: Opportunities and Challenges for Electronic Health Record Data

JMIR Med Inform. 2024 Oct 24:12:e57035. doi: 10.2196/57035.

Abstract

Before deploying a clinical prediction model (CPM) in clinical practice, its performance needs to be demonstrated in the population of intended use. This is also called "targeted validation." Many CPMs developed in tertiary settings may be most useful in secondary care, where the patient case mix is broad and practitioners need to triage patients efficiently. However, since structured or rich datasets of sufficient quality from secondary to assess the performance of a CPM are scarce, a validation gap exists that hampers the implementation of CPMs in secondary care settings. In this viewpoint, we highlight the importance of targeted validation and the use of CPMs in secondary care settings and discuss the potential and challenges of using electronic health record (EHR) data to overcome the existing validation gap. The introduction of software applications for text mining of EHRs allows the generation of structured "big" datasets, but the imperfection of EHRs as a research database requires careful validation of data quality. When using EHR data for the development and validation of CPMs, in addition to widely accepted checklists, we propose considering three additional practical steps: (1) involve a local EHR expert (clinician or nurse) in the data extraction process, (2) perform validity checks on the generated datasets, and (3) provide metadata on how variables were constructed from EHRs. These steps help to generate EHR datasets that are statistically powerful, of sufficient quality and replicable, and enable targeted development and validation of CPMs in secondary care settings. This approach can fill a major gap in prediction modeling research and appropriately advance CPMs into clinical practice.

Keywords: AI; CPM; EHR; EMR; artificial intelligence; clinical prediction model; electronic health record; machine learning; prediction models; secondary care; targeted validation; validation.

Publication types

  • Validation Study

MeSH terms

  • Data Mining / methods
  • Electronic Health Records*
  • Humans
  • Reproducibility of Results
  • Secondary Care*