Two-phase studies are attractive for their economy and efficiency in research settings where large cohorts are available for investigating the prognostic and predictive role of novel genetic and biological factors. In this type of study, information on novel factors is collected only in a convenient subcohort (phase II) drawn from the cohort (phase I) according to a given (optimal) sampling strategy. Estimation of survival in the subcohort needs to account for the design. The Kaplan-Meier method, based on counts of events and of subjects at risk in time, must be applied accounting, with suitable weights, for the sampling probabilities of the subjects in phase II, in order to recover the representativeness of the subcohort for the entire cohort. The authors derived a proper variance estimator of survival by linearization. The proposed method is applied in the context of a two-phase study on childhood acute lymphoblastic leukemia, which was planned in order to evaluate the role of genetic polymorphisms on treatment failure due to relapse. The method has shown satisfactory performance through simulations under different scenarios, including the case-control setting, and proved to be useful for describing results in the clinical example.
Keywords: case–control; missing data; optimal sampling; survival; two-phase design.
© The Author(s) 2014.