Estimating hazard ratios in cohort data with missing disease information due to death

Biom J. 2017 Mar;59(2):251-269. doi: 10.1002/bimj.201500167. Epub 2016 Nov 21.

Abstract

In clinical and epidemiological studies information on the primary outcome of interest, that is, the disease status, is usually collected at a limited number of follow-up visits. The disease status can often only be retrieved retrospectively in individuals who are alive at follow-up, but will be missing for those who died before. Right-censoring the death cases at the last visit (ad-hoc analysis) yields biased hazard ratio estimates of a potential risk factor, and the bias can be substantial and occur in either direction. In this work, we investigate three different approaches that use the same likelihood contributions derived from an illness-death multistate model in order to more adequately estimate the hazard ratio by including the death cases into the analysis: a parametric approach, a penalized likelihood approach, and an imputation-based approach. We investigate to which extent these approaches allow for an unbiased regression analysis by evaluating their performance in simulation studies and on a real data example. In doing so, we use the full cohort with complete illness-death data as reference and artificially induce missing information due to death by setting discrete follow-up visits. Compared to an ad-hoc analysis, all considered approaches provide less biased or even unbiased results, depending on the situation studied. In the real data example, the parametric approach is seen to be too restrictive, whereas the imputation-based approach could almost reconstruct the original event history information.

Keywords: Disease incidence; Illness-death model; Missing disease information; Multiple imputation.

MeSH terms

  • Biometry / methods*
  • Cohort Studies
  • Epidemiologic Studies*
  • Humans
  • Likelihood Functions
  • Proportional Hazards Models*
  • Uncertainty