EnzymeNet: residual neural networks model for Enzyme Commission number prediction

Bioinform Adv. 2023 Nov 24;3(1):vbad173. doi: 10.1093/bioadv/vbad173. eCollection 2023.

Abstract

Motivation: Enzymes are key targets to biosynthesize functional substances in metabolic engineering. Therefore, various machine learning models have been developed to predict Enzyme Commission (EC) numbers, one of the enzyme annotations. However, the previously reported models might predict the sequences with numerous consecutive identical amino acids, which are found within unannotated sequences, as enzymes.

Results: Here, we propose EnzymeNet for prediction of complete EC numbers using residual neural networks. EnzymeNet can exclude the exceptional sequences described above. Several EnzymeNet models were built and optimized to explore the best conditions for removing such sequences. As a result, the models exhibited higher prediction accuracy with macro F1 score up to 0.850 than previously reported models. Moreover, even the enzyme sequences with low similarity to training data, which were difficult to predict using the reported models, could be predicted extensively using EnzymeNet models. The robustness of EnzymeNet models will lead to discover novel enzymes for biosynthesis of functional compounds using microorganisms.

Availability and implementation: The source code of EnzymeNet models is freely available at https://github.com/nwatanbe/enzymenet.