Integrating Information from Existing Risk Prediction Models with No Model Details

Can J Stat. 2023 Jun;51(2):355-374. doi: 10.1002/cjs.11701. Epub 2022 Apr 15.

Abstract

Consider the setting where (i) individual-level data are collected to build a regression model for the association between an event of interest and certain covariates, and (ii) some risk calculators predicting the risk of the event using less detailed covariates are available, possibly as algorithmic black boxes with little information available about how they were built. We propose a general empirical-likelihood-based framework to integrate the rich auxiliary information contained in the calculators into fitting the regression model, to make the estimation of regression parameters more efficient. Two methods are developed, one using working models to extract the calculator information and one making a direct use of calculator predictions without working models. Theoretical and numerical investigations show that the calculator information can substantially reduce the variance of regression parameter estimation. As an application, we study the dependence of the risk of high grade prostate cancer on both conventional risk factors and newly identified molecular biomarkers by integrating information from the Prostate Biopsy Collaborative Group (PBCG) risk calculator, which was built based on conventional risk factors alone.

Insérer votre résumé ici. We will supply a French abstract for those authors who can’t prepare it themselves.

Keywords: Data integration; Empirical likelihood; Estimating equations; Estimation efficiency; External information; Primary 62F12; secondary 62J12.