D-CyPre: a machine learning-based tool for accurate prediction of human CYP450 enzyme metabolic sites

PeerJ Comput Sci. 2024 May 7:10:e2040. doi: 10.7717/peerj-cs.2040. eCollection 2024.

Abstract

The advancement of graph neural networks (GNNs) has made it possible to accurately predict metabolic sites. Despite the combination of GNNs with XGBOOST showing impressive performance, this technology has not yet been applied in the realm of metabolic site prediction. Previous metabolic site prediction tools focused on bonds and atoms, regardless of the overall molecular skeleton. This study introduces a novel tool, named D-CyPre, that amalgamates atom, bond, and molecular skeleton information via two directed message-passing neural networks (D-MPNN) to predict the metabolic sites of the nine cytochrome P450 enzymes using XGBOOST. In D-CyPre Precision Mode, the model produces fewer, but more accurate results (Jaccard score: 0.497, F1: 0.660, and precision: 0.737 in the test set). In D-CyPre Recall Mode, the model produces less accurate, but more comprehensive results (Jaccard score: 0.506, F1: 0.669, and recall: 0.720 in the test set). In the test set of 68 reactants, D-CyPre outperformed BioTransformer on all isoenzymes and CyProduct on most isoenzymes (5/9). For the subtypes where D-CyPre outperformed CyProducts, the Jaccard score and F1 scores increased by 24% and 16% in Precision Mode (4/9) and 19% and 12% in Recall Mode (5/9), respectively, relative to the second-best CyProduct. Overall, D-CyPre provides more accurate prediction results for human CYP450 enzyme metabolic sites.

Keywords: Graph neural networks; In silico metabolism prediction; Machine learning.

Grants and funding

This work was supported by the National Natural Science Foundation of China (No. 82173957 and 82204599). The State Administration of Traditional Chinese Medicine high-level key discipline of Traditional Chinese medicine (Analysis of Traditional Chinese Medicine) construction project (zyyzdxk-2023265) funded the APC for this article. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.