Fast and covariate-adaptive method amplifies detection power in large-scale multiple hypothesis testing

Nat Commun. 2019 Jul 31;10(1):3433. doi: 10.1038/s41467-019-11247-0.

Abstract

Multiple hypothesis testing is an essential component of modern data science. In many settings, in addition to the p-value, additional covariates for each hypothesis are available, e.g., functional annotation of variants in genome-wide association studies. Such information is ignored by popular multiple testing approaches such as the Benjamini-Hochberg procedure (BH). Here we introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% more associations than BH at the same false discovery rate. We prove that AdaFDR controls false discovery proportion and show that it makes substantially more discoveries while controlling false discovery rate (FDR) in extensive experiments. AdaFDR is computationally efficient and allows multi-dimensional covariates with both numeric and categorical values, making it broadly useful across many applications.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms*
  • Data Interpretation, Statistical*
  • Genome-Wide Association Study
  • Humans
  • Magnetic Resonance Imaging
  • Microbiota / genetics
  • Polymorphism, Single Nucleotide / genetics
  • Proteomics
  • Quantitative Trait Loci / genetics
  • Research Design*
  • Sequence Analysis, RNA