The development of a universal in silico predictor of protein-protein interactions

PLoS One. 2013 May 31;8(5):e65587. doi: 10.1371/journal.pone.0065587. Print 2013.

Abstract

Protein-protein interactions (PPIs) are essential for understanding the function of biological systems and have been characterized using a vast array of experimental techniques. These techniques detect only a small proportion of all PPIs and are labor intensive and time consuming. Therefore, the development of computational methods capable of predicting PPIs accelerates the pace of discovery of new interactions. This paper reports a machine learning-based prediction model, the Universal In Silico Predictor of Protein-Protein Interactions (UNISPPI), which is a decision tree model that can reliably predict PPIs for all species (including proteins from parasite-host associations) using only 20 combinations of amino acids frequencies from interacting and non-interacting proteins as learning features. UNISPPI was able to correctly classify 79.4% and 72.6% of experimentally supported interactions and non-interacting protein pairs, respectively, from an independent test set. Moreover, UNISPPI suggests that the frequencies of the amino acids asparagine, cysteine and isoleucine are important features for distinguishing between interacting and non-interacting protein pairs. We envisage that UNISPPI can be a useful tool for prioritizing interactions for experimental validation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artificial Intelligence*
  • Computational Biology / methods
  • Computer Simulation
  • Models, Biological*
  • Protein Binding
  • Protein Interaction Mapping / methods*
  • Reproducibility of Results

Grants and funding

This work received financial support from FAPESP, grant numbers 2009/05234-4 and 2013/02018-4, and from CNPq, grant number 475147/2010-3. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.