Test Statistics and Statistical Inference for Data With Informative Cluster Sizes

Biom J. 2025 Feb;67(1):e70021. doi: 10.1002/bimj.70021.

Abstract

In biomedical studies, investigators often encounter clustered data. The cluster sizes are said to be informative if the outcome depends on the cluster size. Ignoring informative cluster sizes in the analysis leads to biased parameter estimation in marginal and mixed-effect regression models. Several methods to analyze data with informative cluster sizes have been proposed; however, methods to test the informativeness of the cluster sizes are limited, particularly for the marginal model. In this paper, we propose a score test and a Wald test to examine the informativeness of the cluster sizes for a generalized linear model, a Cox model, and a proportional subdistribution hazards model. Statistical inference can be conducted through weighted estimating equations. The simulation results show that both tests control Type I error rates well, but the score test has higher power than the Wald test for right-censored data while the power of the Wald test is generally higher than the score test for the binary outcome. We apply the Wald and score tests to hematopoietic cell transplant data and compare regression analysis results with/without adjusting for informative cluster sizes.

Keywords: Wald test; clustered data; informative cluster sizes; score test.

MeSH terms

  • Biometry* / methods
  • Cluster Analysis
  • Hematopoietic Stem Cell Transplantation
  • Humans
  • Models, Statistical
  • Proportional Hazards Models
  • Regression Analysis