The whole is greater than its parts: ensembling improves protein contact prediction

Sci Rep. 2021 Apr 13;11(1):8039. doi: 10.1038/s41598-021-87524-0.

Abstract

The prediction of amino acid contacts from protein sequence is an important problem, as protein contacts are a vital step towards the prediction of folded protein structures. We propose that a powerful concept from deep learning, called ensembling, can increase the accuracy of protein contact predictions by combining the outputs of different neural network models. We show that ensembling the predictions made by different groups at the recent Critical Assessment of Protein Structure Prediction (CASP13) outperforms all individual groups. Further, we show that contacts derived from the distance predictions of three additional deep neural networks-AlphaFold, trRosetta, and ProSPr-can be substantially improved by ensembling all three networks. We also show that ensembling these recent deep neural networks with the best CASP13 group creates a superior contact prediction tool. Finally, we demonstrate that two ensembled networks can successfully differentiate between the folds of two highly homologous sequences. In order to build further on these findings, we propose the creation of a better protein contact benchmark set and additional open-source contact prediction methods.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Computational Biology / methods
  • Deep Learning
  • Models, Molecular
  • Neural Networks, Computer*
  • Protein Conformation
  • Protein Folding
  • Proteins* / chemistry
  • Proteins* / metabolism
  • Sequence Analysis, Protein / methods
  • Software

Substances

  • Proteins