PreTKcat: A pre-trained representation learning and machine learning framework for predicting enzyme turnover number

Comput Biol Chem. 2025 Jan 1:115:108327. doi: 10.1016/j.compbiolchem.2024.108327. Online ahead of print.

Abstract

The enzyme turnover number (kcat) is crucial for understanding enzyme kinetics and optimizing biotechnological processes. However, experimentally measured kcat values are limited due to the high cost and labor intensity of wet-lab measurements, necessitating robust computational methods. To address this issue, we propose PreTKcat, a framework that integrates pre-trained representation learning and machine learning to predict kcat values. PreTKcat utilizes the ProtT5 protein language model to encode enzyme sequences and the MolGNet molecular representation learning model to encode substrate molecular graphs. By integrating these representations, the ExtraTrees model is employed to predict kcat values. Additionally, PreTKcat accounts for the impact of temperature on kcat prediction. In addition, PreTKcat can also be used to predict enzyme-substrate affinity, i.e. km values. Comparative assessments with various state-of-the-art models highlight the superior performance of PreTKcat. PreTKcat serves as an effective tool for investigating enzyme kinetics, offering new perspectives for enzyme engineering and its industrial uses.

Keywords: Enzyme turnover number; Molecular graph; Representation learning.