HYPROSP: a hybrid protein secondary structure prediction algorithm--a knowledge-based approach

Nucleic Acids Res. 2004 Sep 24;32(17):5059-65. doi: 10.1093/nar/gkh836. Print 2004.

Abstract

We develop a knowledge-based approach (called PROSP) for protein secondary structure prediction. The knowledge base contains small peptide fragments together with their secondary structural information. A quantitative measure M, called match rate, is defined to measure the amount of structural information that a target protein can extract from the knowledge base. Our experimental results show that proteins with a higher match rate will likely be predicted more accurately based on PROSP. That is, there is roughly a monotone correlation between the prediction accuracy and the amount of structure matching with the knowledge base. To fully utilize the strength of our knowledge base, a hybrid prediction method is proposed as follows: if the match rate of a target protein is at least 80%, we use the extracted information to make the prediction; otherwise, we adopt a popular machine-learning approach. This comprises our hybrid protein structure prediction (HYPROSP) approach. We use the DSSP and EVA data as our datasets and PSIPRED as our underlying machine-learning algorithm. For target proteins with match rate at least 80%, the average Q3 of PROSP is 3.96 and 7.2 better than that of PSIPRED on DSSP and EVA data, respectively.

Publication types

  • Evaluation Study

MeSH terms

  • Algorithms*
  • Artificial Intelligence*
  • Internet
  • Peptide Fragments / chemistry
  • Protein Structure, Secondary*
  • Reproducibility of Results

Substances

  • Peptide Fragments