We consider a situation where there is rich historical data available for the coefficients and their standard errors in an established regression model describing the association between a binary outcome variable Y and a set of predicting factors X, from a large study. We would like to utilize this summary information for improving estimation and prediction in an expanded model of interest, Y| X, B. The additional variable B is a new biomarker, measured on a small number of subjects in a new dataset. We develop and evaluate several approaches for translating the external information into constraints on regression coefficients in a logistic regression model of Y| X, B. Borrowing from the measurement error literature we establish an approximate relationship between the regression coefficients in the models Pr(Y = 1| X , β), Pr(Y = 1| X, B, γ) and E(B| X, θ ) for a Gaussian distribution of B. For binary B we propose an alternate expression. The simulation results comparing these methods indicate that historical information on Pr(Y = 1| X , β) can improve the efficiency of estimation and enhance the predictive power in the regression model of interest Pr(Y = 1| X, B, γ). We illustrate our methodology by enhancing the High-grade Prostate Cancer Prevention Trial Risk Calculator, with two new biomarkers prostate cancer antigen 3 and TMPRSS2:ERG.
Keywords: Bayesian methods; Constrained estimation; Logistic regression; Prediction models.