Quantitative evaluation of binding affinity changes upon mutations is crucial for protein engineering and drug design. Machine learning-based methods are gaining increasing momentum in this field. Due to the limited number of experimental data, using a small number of sensitive predictive features is vital to the generalization and robustness of such machine learning methods. Here we introduce a fast and reliable predictor of binding affinity changes upon single point mutation, based on a random forest approach. Our method, iSEE, uses a limited number of interface Structure, Evolution, and Energy-based features for the prediction. iSEE achieves, using only 31 features, a high prediction performance with a Pearson correlation coefficient (PCC) of 0.80 and a root mean square error of 1.41 kcal/mol on a diverse training dataset consisting of 1102 mutations in 57 protein-protein complexes. It competes with existing state-of-the-art methods on two blind test datasets. Predictions for a new dataset of 487 mutations in 56 protein complexes from the recently published SKEMPI 2.0 database reveals that none of the current methods perform well (PCC < 0.42), although their combination does improve the predictions. Feature analysis for iSEE underlines the significance of evolutionary conservations for quantitative prediction of mutation effects. As an application example, we perform a full mutation scanning of the interface residues in the MDM2-p53 complex.
Keywords: binding affinity; full mutation scanning; machine learning; protein-protein interactions; single point mutation.
© 2018 The Authors. Proteins: Structure, Function, and Bioinformatics published by Wiley Periodicals, Inc.