Proportion variables, also known as compositional data, are very common in ecology. Unfortunately, few scientists are aware of how compositional data, when used as covariates, can adversely impact statistical analysis. We describe here how proportion covariates result in multicollinearity and parameter identifiability problems. Using simulated data on bird species richness as a function of land use, we show how these problems manifest when fitting a wide range of models in R, both in a frequentist and Bayesian framework. In particular, we show that similar models can often generate substantially different parameter estimates, leading to very different conclusions. Dropping a covariate or the intercept from the model can solve the multicollinearity and parameter identifiability problems. Unfortunately, these solutions do not fix the inherent challenges associated with interpreting parameter estimates. To this end, we propose focusing the interpretation on the difference of slope parameters to avoid the inherent unidentifiability of individual parameters. We also propose conditional plots with two x-axes and marginal plots as visualization techniques that can help users better interpret their modeling results. We illustrate these problems and proposed solutions using empirical data from the North American Breeding Bird Survey. The practical and straightforward approaches suggested in this article will help the fitting of linear models and interpretation of its results when some of the covariates are proportions.
Keywords: compositional covariates; conditional plot; inference; linear model; marginal plot; multicollinearity; parameter identifiability; parameter interpretation.
© 2024 The Ecological Society of America.