Gene structure prediction using information on homologous protein sequence

Comput Appl Biosci. 1996 Jun;12(3):161-70. doi: 10.1093/bioinformatics/12.3.161.

Abstract

In this paper a new approach for the prediction of protein coding gene structures is described. The principal scheme of prediction is as follows: first, the exons with the best potential are predicted in a sequence with unknown functions and a list of potential amino acid fragments coded by these exons is formed. Second, testing the homology between each amino acid fragment from the list and proteins from the SWISS-PROT database of amino acid sequences. One protein with the best homology is chosen out of all the homologous sequences. Third, reconstruction of the exon-intron structure, basing it on its homology with the chosen protein sequences. The method was tested on an independent control set (20 genes). The results were as follows: 21% of real exons were lost and 3% of non-real exons were found. This system can be used to refine the results of gene prediction systems, especially if highly homologous proteins are found in the amino acid sequence database.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Computers*
  • DNA / genetics
  • Databases, Factual
  • Exons
  • Genes*
  • Humans
  • Introns
  • Metallothionein / genetics
  • Molecular Sequence Data
  • Proteins / genetics*
  • Sequence Alignment / methods*
  • Sequence Homology, Amino Acid

Substances

  • Proteins
  • DNA
  • Metallothionein