Evolutionary approach to predicting the binding site residues of a protein from its primary sequence

Proc Natl Acad Sci U S A. 2011 Mar 29;108(13):5313-8. doi: 10.1073/pnas.1102210108. Epub 2011 Mar 14.

Abstract

Protein binding site residues, especially catalytic residues, play a central role in protein function. Because more than 99% of the ∼ 12 million protein sequences in the nonredundant protein database have no structural information, it is desirable to develop methods to predict the binding site residues of a protein from its primary sequence. This task is highly challenging, because the binding site residues constitute only a small portion of a protein. However, the binding site residues of a protein are clustered in its functional pocket(s), and their spatial patterns tend to be conserved in evolution. To take advantage of these evolutionary and structural principles, we constructed a database of ∼ 50,000 templates (called the pocket-containing segment database), each of which includes not only a sequence segment that contains a functional pocket but also the structural attributes of the pocket. To use this database, we designed a template-matching technique, termed residue-matching profiling, and established a criterion for selecting templates for a query sequence. Finally, we developed a probabilistic model for assigning spatial scores to matched residues between the template and query sequence in local alignments using a set of selected scoring matrices and for computing the binding likelihood of each matched residue in the query sequence. From the likelihoods, one can predict the binding site residues in the query sequence. An automated computational pipeline was developed for our method. A performance evaluation shows that our method achieves a 70% precision in predicting binding site residues at 60% sensitivity.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Amino Acid Sequence*
  • Binding Sites
  • Biological Evolution*
  • Databases, Protein
  • Models, Molecular
  • Molecular Sequence Data
  • Protein Folding
  • Protein Structure, Tertiary
  • Proteins / chemistry*
  • Proteins / genetics*
  • Sequence Analysis, Protein / methods*

Substances

  • Proteins