Review of alignment and SNP calling algorithms for next-generation sequencing data

M Mielczarek; J Szyda

doi:10.1007/s13353-015-0292-7

Review of alignment and SNP calling algorithms for next-generation sequencing data

J Appl Genet. 2016 Feb;57(1):71-9. doi: 10.1007/s13353-015-0292-7. Epub 2015 Jun 9.

Authors

M Mielczarek¹, J Szyda²

Affiliations

¹ Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland. magda.mielczarek@up.wroc.pl.
² Biostatistics Group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kożuchowska 7, 51-631, Wroclaw, Poland.

PMID: 26055432
DOI: 10.1007/s13353-015-0292-7

Abstract

Application of the massive parallel sequencing technology has become one of the most important issues in life sciences. Therefore, it was crucial to develop bioinformatics tools for next-generation sequencing (NGS) data processing. Currently, two of the most significant tasks include alignment to a reference genome and detection of single nucleotide polymorphisms (SNPs). In many types of genomic analyses, great numbers of reads need to be mapped to the reference genome; therefore, selection of the aligner is an essential step in NGS pipelines. Two main algorithms-suffix tries and hash tables-have been introduced for this purpose. Suffix array-based aligners are memory-efficient and work faster than hash-based aligners, but they are less accurate. In contrast, hash table algorithms tend to be slower, but more sensitive. SNP and genotype callers may also be divided into two main different approaches: heuristic and probabilistic methods. A variety of software has been subsequently developed over the past several years. In this paper, we briefly review the current development of NGS data processing algorithms and present the available software.

Keywords: Alignment; Genotype calling; NGS; Review; SNP calling; Software.

Publication types

Review

MeSH terms

Algorithms*
Computational Biology
High-Throughput Nucleotide Sequencing / methods*
Polymorphism, Single Nucleotide*
Sequence Alignment
Sequence Analysis, DNA / methods*
Software*