Binary Interval Search: a scalable algorithm for counting interval intersections

Bioinformatics. 2013 Jan 1;29(1):1-7. doi: 10.1093/bioinformatics/bts652. Epub 2012 Nov 4.

Abstract

Motivation: The comparison of diverse genomic datasets is fundamental to understand genome biology. Researchers must explore many large datasets of genome intervals (e.g. genes, sequence alignments) to place their experimental results in a broader context and to make new discoveries. Relationships between genomic datasets are typically measured by identifying intervals that intersect, that is, they overlap and thus share a common genome interval. Given the continued advances in DNA sequencing technologies, efficient methods for measuring statistically significant relationships between many sets of genomic features are crucial for future discovery.

Results: We introduce the Binary Interval Search (BITS) algorithm, a novel and scalable approach to interval set intersection. We demonstrate that BITS outperforms existing methods at counting interval intersections. Moreover, we show that BITS is intrinsically suited to parallel computing architectures, such as graphics processing units by illustrating its utility for efficient Monte Carlo simulations measuring the significance of relationships between sets of genomic intervals.

Availability: https://github.com/arq5x/bits.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms*
  • Genomics / methods*
  • Monte Carlo Method
  • Sequence Alignment
  • Sequence Analysis, DNA