Marginal modeling in community randomized trials with rare events: Utilization of the negative binomial regression model

Clin Trials. 2022 Apr;19(2):162-171. doi: 10.1177/17407745211063479. Epub 2022 Jan 6.

Abstract

Background/aims: This work is motivated by the HEALing Communities Study, which is a post-test only cluster randomized trial in which communities are randomized to two different trial arms. The primary interest is in reducing opioid overdose fatalities, which will be collected as a count outcome at the community level. Communities range in size from thousands to over one million residents, and fatalities are expected to be rare. Traditional marginal modeling approaches in the cluster randomized trial literature include the use of generalized estimating equations with an exchangeable correlation structure when utilizing subject-level data, or analogously quasi-likelihood based on an over-dispersed binomial variance when utilizing community-level data. These approaches account for and estimate the intra-cluster correlation coefficient, which should be provided in the results from a cluster randomized trial. Alternatively, the coefficient of variation or R coefficient could be reported. In this article, we show that negative binomial regression can also be utilized when communities are large and events are rare. The objectives of this article are (1) to show that the negative binomial regression approach targets the same marginal regression parameter(s) as an over-dispersed binomial model and to explain why the estimates may differ; (2) to derive formulas relating the negative binomial overdispersion parameter k with the intra-cluster correlation coefficient, coefficient of variation, and R coefficient; and (3) analyze pre-intervention data from the HEALing Communities Study to demonstrate and contrast models and to show how to report the intra-cluster correlation coefficient, coefficient of variation, and R coefficient when utilizing negative binomial regression.

Methods: Negative binomial and over-dispersed binomial regression modeling are contrasted in terms of model setup, regression parameter estimation, and formulation of the overdispersion parameter. Three specific models are used to illustrate concepts and address the third objective.

Results: The negative binomial regression approach targets the same marginal regression parameter(s) as an over-dispersed binomial model, although estimates may differ. Practical differences arise in regard to how overdispersion, and hence the intra-cluster correlation coefficient is modeled. The negative binomial overdispersion parameter is approximately equal to the ratio of the intra-cluster correlation coefficient and marginal probability, the square of the coefficient of variation, and the R coefficient minus 1. As a result, estimates corresponding to all four of these different types of overdispersion parameterizations can be reported when utilizing negative binomial regression.

Conclusion: Negative binomial regression provides a valid, practical, alternative approach to the analysis of count data, and corresponding reporting of overdispersion parameters, from community randomized trials in which communities are large and events are rare.

Trial registration: ClinicalTrials.gov NCT04111939.

Keywords: Cluster randomized trial; empirical covariance matrix; generalized estimating equations; intra-cluster correlation coefficient; quasi-likelihood.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Cluster Analysis
  • Humans
  • Likelihood Functions
  • Models, Statistical*
  • Randomized Controlled Trials as Topic

Associated data

  • ClinicalTrials.gov/NCT04111939