HybGFS: a hybrid method for genome-fingerprint scanning

BMC Bioinformatics. 2006 Oct 29:7:479. doi: 10.1186/1471-2105-7-479.

Abstract

Background: Protein identification based on mass spectrometry (MS) has previously been performed using peptide mass fingerprinting (PMF) or tandem MS (MS/MS) database searching. However, these methods cannot identify proteins that are not already listed in existing databases. Moreover, the alternative approach of de novo sequencing requires costly equipment and the interpretation of complex MS/MS spectra. Thus, there is a need for novel high-throughput protein-identification methods that are independent of existing predefined protein databases.

Results: Here, we present a hybrid method for genome-fingerprint scanning, known as HybGFS. This technique combines genome sequence-based peptide MS/MS ion searching with liquid-chromatography elution-time (LC-ET) prediction, to improve the reliability of identification. The hybrid method allows the simultaneous identification and mapping of proteins without a priori information about their coding sequences. The current study used standard LC-MS/MS data to query an in silico-generated six-reading-frame translation and the enzymatic digest of an entire genome. Used in conjunction with precursor/product ion-mass searching, the LC-ETs increased confidence in the peptide-identification process and reduced the number of false-positive matches. The power of this method was demonstrated using recombinant proteins from the Escherichia coli K12 strain.

Conclusion: The novel hybrid method described in this study will be useful for the large-scale experimental confirmation of genome coding sequences, without the need for transcriptome-level expression analysis or costly MS database searching.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromatography, Liquid*
  • Databases, Genetic
  • Escherichia coli Proteins / chemistry
  • Escherichia coli Proteins / genetics*
  • Genes, Bacterial*
  • Peptide Mapping / methods*
  • Recombinant Proteins / chemistry
  • Reproducibility of Results
  • Sequence Analysis, Protein
  • Software
  • Tandem Mass Spectrometry*
  • Time Factors

Substances

  • Escherichia coli Proteins
  • Recombinant Proteins