Background: Indirect adjustment via partitioned regression is a promising technique to control for unmeasured confounding in large epidemiological studies. The method uses a representative ancillary dataset to estimate the association between variables missing in a primary dataset with the complete set of variables of the ancillary dataset to produce an adjusted risk estimate for the variable in question. The objective of this paper is threefold: 1) evaluate the method for non-linear survival models, 2) formalize an empirical process to evaluate the suitability of the required ancillary matching dataset, and 3) test modifications to the method to incorporate time-varying exposure data, and proportional weighting of datasets.
Methods: We used the association between fine particle air pollution (PM2.5) with mortality in the 2001 Canadian Census Health and Environment Cohort (CanCHEC, N = 2.4 million, 10-years follow-up) as our primary dataset, and the 2001 cycle of the Canadian Community Health Survey (CCHS, N = 80,630) as the ancillary matching dataset that contained confounding risk factor information not available in CanCHEC (e.g., smoking). The main evaluation process used a gold-standard approach wherein two variables (education and income) available in both datasets were excluded, indirectly adjusted for, and compared to true models with education and income included to assess the amount of bias correction. An internal validation for objective 1 used only CanCHEC data, whereas an external validation for objective 2 replaced CanCHEC with the CCHS. The two proposed modifications were applied as part of the validation tests, as well as in a final indirect adjustment of four missing risk factor variables (smoking, alcohol use, diet, and exercise) in which adjustment direction and magnitude was compared to models using an equivalent longitudinal cohort with direct adjustment for the same variables.
Results: At baseline (2001) both cohorts had very similar PM2.5 distributions across population characteristics, although levels for CCHS participants were consistently 1.8-2.0 μg/m3 lower. Applying sample-weighting largely corrected for this discrepancy. The internal validation tests showed minimal downward bias in PM2.5 mortality hazard ratios of 0.4-0.6% using a static exposure, and 1.7-3% when a time-varying exposure was used. The external validation of the CCHS as the ancillary dataset showed slight upward bias of -0.7 to -1.1% and downward bias of 1.3-2.3% using the static and time-varying approaches respectively.
Conclusions: The CCHS was found to be fairly well representative of CanCHEC and its use in Canada for indirect adjustment is warranted. Indirect adjustment methods can be used with survival models to correct hazard ratio point estimates and standard errors in models missing key covariates when a representative matching dataset is available. The results of this formal evaluation should encourage other cohorts to assess the suitability of ancillary datasets for the application of the indirect adjustment methodology to address potential residual confounding.
Keywords: Air pollution; Cohort study; Confounding; Indirect adjustment; Survival analysis.
Crown Copyright © 2019. Published by Elsevier Inc. All rights reserved.