Interpretable identification of cancer genes across biological networks via transformer-powered graph representation learning

Xiaorui Su; Pengwei Hu; Dongxu Li; Bowei Zhao; Zhaomeng Niu; Thomas Herget; Philip S Yu; Lun Hu

doi:10.1038/s41551-024-01312-5

Interpretable identification of cancer genes across biological networks via transformer-powered graph representation learning

Nat Biomed Eng. 2025 Jan 9. doi: 10.1038/s41551-024-01312-5. Online ahead of print.

Authors

Xiaorui Su^{1

2

3}, Pengwei Hu^{1

2}, Dongxu Li^{1

2}, Bowei Zhao^{1

2}, Zhaomeng Niu⁴, Thomas Herget⁵, Philip S Yu³, Lun Hu^{6

7}

Affiliations

¹ Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China.
² University of Chinese Academy of Sciences, Beijing, China.
³ Department of Computer Science, University of Illinois Chicago, Chicago, IL, USA.
⁴ Department of Health Informatics, Rutgers School of Health Professions, Piscataway, NJ, USA.
⁵ Merck KGaA, Darmstadt, Germany.
⁶ Xinjiang Technical Institutes of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, China. hulun@ms.xjb.ac.cn.
⁷ University of Chinese Academy of Sciences, Beijing, China. hulun@ms.xjb.ac.cn.

PMID: 39789329
DOI: 10.1038/s41551-024-01312-5

Abstract

Graph representation learning has been leveraged to identify cancer genes from biological networks. However, its applicability is limited by insufficient interpretability and generalizability under integrative network analysis. Here we report the development of an interpretable and generalizable transformer-based model that accurately predicts cancer genes by leveraging graph representation learning and the integration of multi-omics data with the topologies of homogeneous and heterogeneous networks of biological interactions. The model allows for the interpretation of the respective importance of multi-omic and higher-order structural features, achieved state-of-the-art performance in the prediction of cancer genes across biological networks (including networks of interactions between miRNA and proteins, transcription factors and proteins, and transcription factors and miRNA) in pan-cancer and cancer-specific scenarios, and predicted 57 cancer-gene candidates (including three genes that had not been identified by other models) among 4,729 unlabelled genes across 8 pan-cancer datasets. The model's interpretability and generalization may facilitate the understanding of gene-related regulatory mechanisms and the discovery of new cancer genes.

Abstract

Grants and funding