SNP Discovery from Single and Multiplex Genome Assemblies of Non-model Organisms

Methods Mol Biol. 2018:1712:113-144. doi: 10.1007/978-1-4939-7514-3_9.

Abstract

Population genetic studies of non-model organisms often rely on initial ascertainment of genetic markers from a single individual or a small pool of individuals. This initial screening has been a significant barrier to beginning population studies on non-model organisms (Aitken et al., Mol Ecol 13:1423-1431, 2004; Morin et al., Trends Ecol Evol 19:208-216, 2004). As genomic data become increasingly available for non-model species, SNP ascertainment from across the genome can be performed directly from published genome contigs and short-read archive data. Alternatively, low to medium genome coverage from shotgun NGS library sequencing of single or pooled samples, or from reduced-representation libraries (e.g., capture enrichment; see Ref. "Hancock-Hanser et al., Mol Ecol Resour 13:254-268, 2013") can produce sufficient new data for SNP discovery with limited investment. We describe protocols for assembly of short read data to reference or related species genome contig sequences, followed by SNP discovery and filtering to obtain an optimal set of SNPs for population genotyping using a variety of downstream high-throughput genotyping methods.

Keywords: Bioinformatics; Non-model organisms; Reference-guided assembly; Short read archive; Single-nucleotide polymorphism.

MeSH terms

  • Animals
  • Contig Mapping
  • DNA, Bacterial
  • Genetic Markers
  • Genome / genetics*
  • Genomic Library
  • High-Throughput Nucleotide Sequencing / methods
  • Multiplex Polymerase Chain Reaction / methods*
  • Polymorphism, Single Nucleotide / genetics*
  • Sequence Alignment
  • Single-Cell Analysis / methods*
  • Software
  • Statistics as Topic

Substances

  • DNA, Bacterial
  • Genetic Markers