Analysis of the X chromosome has been largely neglected in genetic studies mainly because of complex underlying biological mechanisms. On the other hand, the study of human microbiome data (typically over-dispersed counts with an excess of zeros) has generated great interest recently because of advancements in next-generation sequencing technologies. We propose a novel approach to infer the association between host genetic variants in the X-chromosome and microbiome data. The method accounts for random X-chromosome inactivation (XCI), skewed (or nonrandom) XCI (XCI-S), and escape of XCI (XCI-E). The inference is performed through a finite mixture model (FMM), in which an indicator variable denoting the "true" biological mechanism is treated as missing data. An expectation-maximization algorithm on zero-inflated and two-part models is implemented to estimate genetic effects. We investigate the performance of the FMM along with strategies that assume XCI and XCI-E mechanisms for all subjects compared with alternative approaches. Briefly, an XCI mechanism codes males' genotypes as homozygous females, whereas under XCI-E, males are treated as heterozygous females. By comprehensive simulations, we evaluate tests of the hypothesis under a computationally efficient score statistic. In summary, the FMM renders reduced bias and commensurate power compared to XCI, XCI-E, and alternative strategies while maintaining adequate Type 1 error control. The proposed method has far-reaching applications. In particular, we illustrate its usage on a large-scale human microbiome study, the Genetic, Environmental and Microbial (GEM) project, to test for the genetic association on the X chromosome.
Keywords: X-chromosome association; finite mixture models; microbiome data; random/escape/skewed X-chromosome inactivation.
© 2019 Wiley Periodicals, Inc.