Informing a Risk Prediction Model for Binary Outcomes with External Coefficient Information

J R Stat Soc Ser C Appl Stat. 2019 Jan;68(1):121-139. doi: 10.1111/rssc.12306. Epub 2018 Aug 13.

Abstract

We consider a situation where there is rich historical data available for the coefficients and their standard errors in an established regression model describing the association between a binary outcome variable Y and a set of predicting factors X, from a large study. We would like to utilize this summary information for improving estimation and prediction in an expanded model of interest, Y| X, B. The additional variable B is a new biomarker, measured on a small number of subjects in a new dataset. We develop and evaluate several approaches for translating the external information into constraints on regression coefficients in a logistic regression model of Y| X, B. Borrowing from the measurement error literature we establish an approximate relationship between the regression coefficients in the models Pr(Y = 1| X , β), Pr(Y = 1| X, B, γ) and E(B| X, θ ) for a Gaussian distribution of B. For binary B we propose an alternate expression. The simulation results comparing these methods indicate that historical information on Pr(Y = 1| X , β) can improve the efficiency of estimation and enhance the predictive power in the regression model of interest Pr(Y = 1| X, B, γ). We illustrate our methodology by enhancing the High-grade Prostate Cancer Prevention Trial Risk Calculator, with two new biomarkers prostate cancer antigen 3 and TMPRSS2:ERG.

Keywords: Bayesian methods; Constrained estimation; Logistic regression; Prediction models.