Background: Many studies in psychological and educational research aim to estimate population average treatment effects (PATE) using data from large complex survey samples, and many of these studies use propensity score methods. Recent advances have investigated how to incorporate survey weights with propensity score methods. However, to this point, that work had not been well summarized, and it was not clear how much difference the different PATE estimation methods would make empirically.
Purpose: The purpose of this study is to systematically summarize the appropriate use of survey weights in propensity score analysis of complex survey data and use a case study to empirically compare the PATE estimates using multiple analysis methods that include ordinary least squares regression, weighted least squares regression, and various propensity score applications.
Methods: We first summarize various propensity score methods that handle survey weights. We then demonstrate the performance of various analysis methods using a nationally representative data set, the Early Childhood Longitudinal Study-Kindergarten to estimate the effects of preschool on children's academic achievement. The correspondence of the results was evaluated using multiple criteria.
Results and conclusions: It is important for researchers to think carefully about their estimand of interest and use methods appropriate for that estimand. If interest is in drawing inferences to the survey target population, it is important to take the survey weights into account, particularly in the outcome analysis stage for estimating the PATE. The case study shows, however, not much difference among various analysis methods in one applied example.
Keywords: complex surveys; equivalence test; population average treatment effects; propensity scores; survey weights.