Highly accurate classification of Watson-Crick basepairs on termini of single DNA molecules

Biophys J. 2003 Feb;84(2 Pt 1):967-76. doi: 10.1016/S0006-3495(03)74913-3.

Abstract

We introduce a computational method for classification of individual DNA molecules measured by an alpha-hemolysin channel detector. We show classification with better than 99% accuracy for DNA hairpin molecules that differ only in their terminal Watson-Crick basepairs. Signal classification was done in silico to establish performance metrics (i.e., where train and test data were of known type, via single-species data files). It was then performed in solution to assay real mixtures of DNA hairpins. Hidden Markov Models (HMMs) were used with Expectation/Maximization for denoising and for associating a feature vector with the ionic current blockade of the DNA molecule. Support Vector Machines (SVMs) were used as discriminators, and were the focus of off-line training. A multiclass SVM architecture was designed to place less discriminatory load on weaker discriminators, and novel SVM kernels were used to boost discrimination strength. The tuning on HMMs and SVMs enabled biophysical analysis of the captured molecule states and state transitions; structure revealed in the biophysical analysis was used for better feature selection.

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Algorithms*
  • Base Pair Mismatch
  • Base Pairing*
  • Base Sequence
  • Biosensing Techniques
  • DNA / analysis
  • DNA / chemistry*
  • DNA / classification*
  • Hemolysin Proteins / chemistry
  • Markov Chains
  • Molecular Probe Techniques
  • Molecular Sequence Data
  • Nanotechnology / methods
  • Neural Networks, Computer
  • Nucleic Acid Conformation
  • Pattern Recognition, Automated
  • Quality Control
  • Signal Processing, Computer-Assisted*

Substances

  • Hemolysin Proteins
  • DNA