Pandora: nucleotide-resolution bacterial pan-genomics with reference graphs

Genome Biol. 2021 Sep 14;22(1):267. doi: 10.1186/s13059-021-02473-1.

Abstract

We present pandora, a novel pan-genome graph structure and algorithms for identifying variants across the full bacterial pan-genome. As much bacterial adaptability hinges on the accessory genome, methods which analyze SNPs in just the core genome have unsatisfactory limitations. Pandora approximates a sequenced genome as a recombinant of references, detects novel variation and pan-genotypes multiple samples. Using a reference graph of 578 Escherichia coli genomes, we compare 20 diverse isolates. Pandora recovers more rare SNPs than single-reference-based tools, is significantly better than picking the closest RefSeq reference, and provides a stable framework for analyzing diverse samples without reference bias.

Keywords: Accessory genome; Genome graph; Nanopore; Pan-genome.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Escherichia coli / genetics
  • Genetic Variation
  • Genome, Bacterial*
  • Genomics / methods*
  • High-Throughput Nucleotide Sequencing
  • Nanopore Sequencing
  • Nucleotides
  • Sequence Alignment
  • Sequence Analysis, DNA
  • Software*

Substances

  • Nucleotides

Associated data

  • figshare/10.6084/m9.figshare.14779257.v1
  • figshare/10.6084/m9.figshare.14781732.v1
  • figshare/10.6084/m9.figshare.14781756.v1
  • figshare/10.6084/m9.figshare.14815899.v2