Modeling read counts for CNV detection in exome sequencing data

Stat Appl Genet Mol Biol. 2011 Nov 8;10(1):Article 52. doi: 10.2202/1544-6115.1732.

Abstract

Varying depth of high-throughput sequencing reads along a chromosome makes it possible to observe copy number variants (CNVs) in a sample relative to a reference. In exome and other targeted sequencing projects, technical factors increase variation in read depth while reducing the number of observed locations, adding difficulty to the problem of identifying CNVs. We present a hidden Markov model for detecting CNVs from raw read count data, using background read depth from a control set as well as other positional covariates such as GC-content. The model, exomeCopy, is applied to a large chromosome X exome sequencing project identifying a list of large unique CNVs. CNVs predicted by the model and experimentally validated are then recovered using a cross-platform control set from publicly available exome sequencing data. Simulations show high sensitivity for detecting heterozygous and homozygous CNVs, outperforming normalization and state-of-the-art segmentation methods.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Base Composition
  • Chromosomes, Human, X / genetics
  • Computer Simulation
  • DNA Copy Number Variations*
  • Databases, Genetic
  • Exome*
  • Gene Frequency
  • Genetic Carrier Screening
  • Homozygote
  • Humans
  • Markov Chains*
  • Predictive Value of Tests
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods*