16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies

ISME J. 2016 Apr;10(4):1020-4. doi: 10.1038/ismej.2015.161. Epub 2015 Sep 11.

Abstract

The 16S rRNA gene (16S) is an accepted marker of bacterial taxonomic diversity, even though differences in copy number obscure the relationship between amplicon and organismal abundances. Ancestral state reconstruction methods can predict 16S copy numbers through comparisons with closely related reference genomes; however, the database of closed genomes is limited. Here, we extend the reference database of 16S copy numbers to de novo assembled draft genomes by developing 16Stimator, a method to estimate 16S copy numbers when these repetitive regions collapse during assembly. Using a read depth approach, we estimate 16S copy numbers for 12 endophytic isolates from Arabidopsis thaliana and confirm estimates by qPCR. We further apply this approach to draft genomes deposited in NCBI and demonstrate accurate copy number estimation regardless of sequencing platform, with an overall median deviation of 14%. The expanded database of isolates with 16S copy number estimates increases the power of phylogenetic correction methods for determining organismal abundances from 16S amplicon surveys.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Arabidopsis / microbiology*
  • Bacteria / genetics
  • Bacteroides
  • Computational Biology
  • Escherichia coli
  • Gene Dosage
  • Genome, Bacterial*
  • Phylogeny
  • Plant Leaves / genetics*
  • Pseudomonas aeruginosa
  • RNA, Ribosomal, 16S / genetics*
  • Reproducibility of Results
  • Sequence Analysis, DNA / methods*
  • Staphylococcus aureus

Substances

  • RNA, Ribosomal, 16S