Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data

PLoS One. 2015 Jul 6;10(7):e0129606. doi: 10.1371/journal.pone.0129606. eCollection 2015.

Abstract

Typical data in a microbiome study consist of the operational taxonomic unit (OTU) counts that have the characteristic of excess zeros, which are often ignored by investigators. In this paper, we compare the performance of different competing methods to model data with zero inflated features through extensive simulations and application to a microbiome study. These methods include standard parametric and non-parametric models, hurdle models, and zero inflated models. We examine varying degrees of zero inflation, with or without dispersion in the count component, as well as different magnitude and direction of the covariate effect on structural zeros and the count components. We focus on the assessment of type I error, power to detect the overall covariate effect, measures of model fit, and bias and effectiveness of parameter estimations. We also evaluate the abilities of model selection strategies using Akaike information criterion (AIC) or Vuong test to identify the correct model. The simulation studies show that hurdle and zero inflated models have well controlled type I errors, higher power, better goodness of fit measures, and are more accurate and efficient in the parameter estimation. Besides that, the hurdle models have similar goodness of fit and parameter estimation for the count component as their corresponding zero inflated models. However, the estimation and interpretation of the parameters for the zero components differs, and hurdle models are more stable when structural zeros are absent. We then discuss the model selection strategy for zero inflated data and implement it in a gut microbiome study of > 400 independent subjects.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Bias
  • Gastrointestinal Microbiome / physiology*
  • Gastrointestinal Tract / microbiology*
  • Humans
  • Models, Statistical
  • Regression Analysis

Grants and funding

All authors disclose no potential conflicts (financial, professional, or personal) that are relevant to the manuscript. Williams Turpin acknowledges a CAG/CIHR Ferring Pharmaceuticals Inc. award; Lizhen Xu is a recipient of a CIHR STAGE fellowship. The authors would like to acknowledge the GEM Project Steering Committee, Recruitment Site Directors and Global Project Office members supported by Crohn’s and Colitis Canada (CCC) and the Leona M. and Harry B. Helmsley Charitable Trust. The GEM Project Research Team is composed of i) Steering Committee: Paul Beck, Charles Bernstein, Alain Bitton, Kenneth Croitoru, Leo Dieleman, Brian Faegan, Anne Griffiths, David Guttman, Kevan Jacobson, Gil Kaplan, Karen Madsen, John Marshall, Paul Moayyedi, Mark Ropeleski, Ernest Seidman, Mark Silverberg, Kathy Siminovitch, Andy Stadnyk, Hilary Steinhart, Michael Surette, Dan Turner, Tom Walters, and Bruce Vallance; ii) Recruitment Center Directors: Guy Aumais, Paul Beck, Charles Bernstein, Alain Bitton, Brian Bressler, Herbert Brill, Maria Cino, Jeff Critch, Lee Denson, Colette Deslandres, Leo Dieleman, Martha Dirks, Wael El-Matary, Brian Feagan, Anne Griffiths, Hans Herfarth, Peter Higgins, Hien Huynh, Jeff Hyams, Kevan Jacobson, Gilaad Kaplan, Desmond Leddin, David Mack, John Marshall, Jerry McGrath, Anthony Otley, Remo Panancionne, Sophie Plamondon, Mark Ropeleski, Fred Saibil, Ernie Seidman, Corey Siegel, Mark Silverberg, Scott Snapper, Hillary Steinhart, and Dan Turner; and iii) The Global Project Office: Nellie Allam, Kenneth Croitoru, Alexandra Keludjian, Ana Olteanu, Kevin Ow and Isabelle Yeadon.