DELIMINATE--a fast and efficient method for loss-less compression of genomic sequences: sequence analysis

Bioinformatics. 2012 Oct 1;28(19):2527-9. doi: 10.1093/bioinformatics/bts467. Epub 2012 Jul 25.

Abstract

Summary: An unprecedented quantity of genome sequence data is currently being generated using next-generation sequencing platforms. This has necessitated the development of novel bioinformatics approaches and algorithms that not only facilitate a meaningful analysis of these data but also aid in efficient compression, storage, retrieval and transmission of huge volumes of the generated data. We present a novel compression algorithm (DELIMINATE) that can rapidly compress genomic sequence data in a loss-less fashion. Validation results indicate relatively higher compression efficiency of DELIMINATE when compared with popular general purpose compression algorithms, namely, gzip, bzip2 and lzma.

Availability and implementation: Linux, Windows and Mac implementations (both 32 and 64-bit) of DELIMINATE are freely available for download at: http://metagenomics.atc.tcs.com/compression/DELIMINATE.

Contact: sharmila@atc.tcs.com

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

  • Algorithms*
  • Base Sequence
  • Computational Biology / methods*
  • Data Compression / methods*
  • Genomics / methods*
  • Sequence Analysis, DNA / methods