The genetic architecture of protein stability

Andre J Faure; Aina Martí-Aranda; Cristina Hidalgo-Carcedo; Antoni Beltran; Jörn M Schmiedel; Ben Lehner

doi:10.1038/s41586-024-07966-0

The genetic architecture of protein stability

Nature. 2024 Oct;634(8035):995-1003. doi: 10.1038/s41586-024-07966-0. Epub 2024 Sep 25.

Authors

Andre J Faure^{1

2}, Aina Martí-Aranda^{3

4}, Cristina Hidalgo-Carcedo³, Antoni Beltran³, Jörn M Schmiedel^{3

5}, Ben Lehner^{6

7

8

9}

Affiliations

¹ Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain. andre.faure@crg.eu.
² ALLOX, Barcelona, Spain. andre.faure@crg.eu.
³ Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain.
⁴ Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK.
⁵ factorize.bio, Berlin, Germany.
⁶ Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain. bl11@sanger.ac.uk.
⁷ Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK. bl11@sanger.ac.uk.
⁸ Universitat Pompeu Fabra (UPF), Barcelona, Spain. bl11@sanger.ac.uk.
⁹ Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain. bl11@sanger.ac.uk.

Abstract

There are more ways to synthesize a 100-amino acid (aa) protein (20¹⁰⁰) than there are atoms in the universe. Only a very small fraction of such a vast sequence space can ever be experimentally or computationally surveyed. Deep neural networks are increasingly being used to navigate high-dimensional sequence spaces¹. However, these models are extremely complicated. Here, by experimentally sampling from sequence spaces larger than 10¹⁰, we show that the genetic architecture of at least some proteins is remarkably simple, allowing accurate genetic prediction in high-dimensional sequence spaces with fully interpretable energy models. These models capture the nonlinear relationships between free energies and phenotypes but otherwise consist of additive free energy changes with a small contribution from pairwise energetic couplings. These energetic couplings are sparse and associated with structural contacts and backbone proximity. Our results indicate that protein genetics is actually both rather simple and intelligible.

MeSH terms

Computer Simulation
Deep Learning
Humans
Models, Genetic*
Neural Networks, Computer
Phenotype*
Protein Stability*
Proteins* / chemistry
Proteins* / genetics
Thermodynamics*

Substances

Proteins