Computing highly specific and noise-tolerant oligomers efficiently

Tomoyuki Yamada; Shinichi Morishita

doi:10.1142/s0219720004000454

Computing highly specific and noise-tolerant oligomers efficiently

J Bioinform Comput Biol. 2004 Mar;2(1):21-46. doi: 10.1142/s0219720004000454.

Authors

Tomoyuki Yamada¹, Shinichi Morishita

Affiliation

¹ Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5, Kashinoha, Kashiwa City, Chiba Pref. 277-8562, Japan. yamada@gi.k.u-tokyo.ac.jp

PMID: 15272431
DOI: 10.1142/s0219720004000454

Abstract

The sequencing of the genomes of a variety of species and the growing databases containing expressed sequence tags (ESTs) and complementary DNAs (cDNAs) facilitate the design of highly specific oligomers for use as genomic markers, PCR primers, or DNA oligo microarrays. The first step in evaluating the specificity of short oligomers of about 20 units in length is to determine the frequencies at which the oligomers occur. However, for oligomers longer than about fifty units this is not efficient, as they usually have a frequency of only 1. A more suitable procedure is to consider the mismatch tolerance of an oligomer, that is, the minimum number of mismatches that allows a given oligomer to match a substring other than the target sequence anywhere in the genome or the EST database. However, calculating the exact value of mismatch tolerance is computationally costly and impractical. Therefore, we studied the problem of checking whether an oligomer meets the constraint that its mismatch tolerance is no less than a given threshold. Here, we present an efficient dynamic programming algorithm solution that utilizes suffix and height arrays. We demonstrated the effectiveness of this algorithm by efficiently computing a dense list of numerous oligo-markers applicable to the human genome. Experimental results show that the algorithm runs faster than well-known Abrahamson's algorithm by orders of magnitude and is able to enumerate 65% approximately 76% of qualified oligomers.

Copyright Imperial College Press

Publication types

Comparative Study
Evaluation Study
Validation Study

MeSH terms

Algorithms*
Base Sequence
DNA Probes / chemistry*
DNA Probes / genetics
Gene Expression Profiling / methods*
Molecular Sequence Data
Oligonucleotide Array Sequence Analysis / methods*
Oligonucleotides / chemistry*
Oligonucleotides / genetics
Reproducibility of Results
Sensitivity and Specificity
Sequence Alignment / methods*
Sequence Analysis, DNA / methods*
Stochastic Processes

Substances

DNA Probes
Oligonucleotides