Halvade: scalable sequence analysis with MapReduce

Dries Decap; Joke Reumers; Charlotte Herzeel; Pascal Costanza; Jan Fostier

doi:10.1093/bioinformatics/btv179

Halvade: scalable sequence analysis with MapReduce

Bioinformatics. 2015 Aug 1;31(15):2482-8. doi: 10.1093/bioinformatics/btv179. Epub 2015 Mar 26.

Authors

Dries Decap¹, Joke Reumers², Charlotte Herzeel³, Pascal Costanza⁴, Jan Fostier¹

Affiliations

¹ Department of Information Technology, Ghent University - iMinds, Gaston Crommenlaan 8 bus 201, 9050 Ghent, Belgium, ExaScience Life Lab, Kapeldreef 75, 3001 Leuven, Belgium.
² ExaScience Life Lab, Kapeldreef 75, 3001 Leuven, Belgium, Janssen Research & Development, a division of Janssen Pharmaceutica N.V., 2340 Beerse, Belgium.
³ ExaScience Life Lab, Kapeldreef 75, 3001 Leuven, Belgium, Imec, Kapeldreef 75, 3001 Leuven, Belgium, and.
⁴ ExaScience Life Lab, Kapeldreef 75, 3001 Leuven, Belgium, Intel Corporation Belgium.

Abstract

Motivation: Post-sequencing DNA analysis typically consists of read mapping followed by variant calling. Especially for whole genome sequencing, this computational step is very time-consuming, even when using multithreading on a multi-core machine.

Results: We present Halvade, a framework that enables sequencing pipelines to be executed in parallel on a multi-node and/or multi-core compute infrastructure in a highly efficient manner. As an example, a DNA sequencing analysis pipeline for variant calling has been implemented according to the GATK Best Practices recommendations, supporting both whole genome and whole exome sequencing. Using a 15-node computer cluster with 360 CPU cores in total, Halvade processes the NA12878 dataset (human, 100 bp paired-end reads, 50× coverage) in <3 h with very high parallel efficiency. Even on a single, multi-core machine, Halvade attains a significant speedup compared with running the individual tools with multithreading.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Genome, Human
Humans
Sequence Analysis, DNA / methods*
Software*