Estimating hazard ratios in cohort data with missing disease information due to death

Nadine Binder; Anne-Sophie Herrnböck; Martin Schumacher

doi:10.1002/bimj.201500167

Estimating hazard ratios in cohort data with missing disease information due to death

Biom J. 2017 Mar;59(2):251-269. doi: 10.1002/bimj.201500167. Epub 2016 Nov 21.

Authors

Nadine Binder^{1

2}, Anne-Sophie Herrnböck², Martin Schumacher²

Affiliations

¹ Freiburg Center for Data Analysis and Modeling, University of Freiburg, Eckerstr. 1, 79104, Freiburg, Germany.
² Institute for Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Stefan-Meier-Str. 26, 79104, Freiburg, Germany.

PMID: 27870130
DOI: 10.1002/bimj.201500167

Abstract

In clinical and epidemiological studies information on the primary outcome of interest, that is, the disease status, is usually collected at a limited number of follow-up visits. The disease status can often only be retrieved retrospectively in individuals who are alive at follow-up, but will be missing for those who died before. Right-censoring the death cases at the last visit (ad-hoc analysis) yields biased hazard ratio estimates of a potential risk factor, and the bias can be substantial and occur in either direction. In this work, we investigate three different approaches that use the same likelihood contributions derived from an illness-death multistate model in order to more adequately estimate the hazard ratio by including the death cases into the analysis: a parametric approach, a penalized likelihood approach, and an imputation-based approach. We investigate to which extent these approaches allow for an unbiased regression analysis by evaluating their performance in simulation studies and on a real data example. In doing so, we use the full cohort with complete illness-death data as reference and artificially induce missing information due to death by setting discrete follow-up visits. Compared to an ad-hoc analysis, all considered approaches provide less biased or even unbiased results, depending on the situation studied. In the real data example, the parametric approach is seen to be too restrictive, whereas the imputation-based approach could almost reconstruct the original event history information.

Keywords: Disease incidence; Illness-death model; Missing disease information; Multiple imputation.

MeSH terms

Biometry / methods*
Cohort Studies
Epidemiologic Studies*
Humans
Likelihood Functions
Proportional Hazards Models*
Uncertainty