Modeling genome coverage in single-cell sequencing

Bioinformatics. 2014 Nov 15;30(22):3159-65. doi: 10.1093/bioinformatics/btu540. Epub 2014 Aug 8.

Abstract

Motivation: Single-cell DNA sequencing is necessary for examining genetic variation at the cellular level, which remains hidden in bulk sequencing experiments. But because they begin with such small amounts of starting material, the amount of information that is obtained from single-cell sequencing experiment is highly sensitive to the choice of protocol employed and variability in library preparation. In particular, the fraction of the genome represented in single-cell sequencing libraries exhibits extreme variability due to quantitative biases in amplification and loss of genetic material.

Results: We propose a method to predict the genome coverage of a deep sequencing experiment using information from an initial shallow sequencing experiment mapped to a reference genome. The observed coverage statistics are used in a non-parametric empirical Bayes Poisson model to estimate the gain in coverage from deeper sequencing. This approach allows researchers to know statistical features of deep sequencing experiments without actually sequencing deeply, providing a basis for optimizing and comparing single-cell sequencing protocols or screening libraries.

Availability and implementation: The method is available as part of the preseq software package. Source code is available at http://smithlabresearch.org/preseq.

Contact: andrewds@usc.edu

Supplementary information: Supplementary material is available at Bioinformatics online.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Bayes Theorem
  • Gene Library
  • Genetic Variation
  • Genome, Human
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Models, Genetic
  • Sequence Analysis, DNA / methods*
  • Single-Cell Analysis