Estimating copy numbers of alleles from population-scale high-throughput sequencing data

BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S4. doi: 10.1186/1471-2105-16-S1-S4. Epub 2015 Jan 21.

Abstract

Background: With the recent development of microarray and high-throughput sequencing (HTS) technologies, a number of studies have revealed catalogs of copy number variants (CNVs) and their association with phenotypes and complex traits. In parallel, a number of approaches to predict CNV regions and genotypes are proposed for both microarray and HTS data. However, only a few approaches focus on haplotyping of CNV loci.

Results: We propose a novel approach to infer copy unit alleles and their numbers in each sample simultaneously from population-scale HTS data by variational Bayesian inference on a generative probabilistic model inspired by latent Dirichlet allocation, which is a well studied model for document classification problems. In simulation studies, we evaluated concordance between inferred and true copy unit alleles for lower-, middle-, and higher-copy number dataset, in which precision and recall were ≥ 0.9 for data with mean coverage ≥ 10× per copy unit. We also applied the approach to HTS data of 1123 samples at highly variable salivary amylase gene locus and a pseudogene locus, and confirmed consistency of the estimated alleles within samples belonging to a trio of CEPH/Utah pedigree 1463 with 11 offspring.

Conclusions: Our proposed approach enables detailed analysis of copy number variations, such as association study between copy unit alleles and phenotypes or biological features including human diseases.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles*
  • Amylases / genetics
  • Bayes Theorem
  • Computational Biology / methods*
  • DNA Copy Number Variations / genetics*
  • Female
  • Genetics, Population
  • Haplotypes
  • High-Throughput Nucleotide Sequencing*
  • Humans
  • Male
  • Models, Statistical
  • Pedigree
  • Phenotype
  • Saliva / enzymology
  • Utah

Substances

  • Amylases