Transforming RNA-Seq gene expression to track cancer progression in the multi-stage early to advanced-stage cancer development

PLoS One. 2023 Apr 24;18(4):e0284458. doi: 10.1371/journal.pone.0284458. eCollection 2023.

Abstract

Background: Cancer progression can be tracked by gene expression changes that occur throughout early-stage to advanced-stage cancer development. The accumulated genetic changes can be detected when gene expression levels in advanced-stage are less variable but show high variability in early-stage. Normalizing advanced-stage expression samples with early-stage and clustering of the normalized expression samples can reveal cancers with similar or different progression and provide insight into clinical and phenotypic patterns of patient samples within the same cancer.

Objective: This study aims to investigate cancer progression through RNA-Seq expression profiles across the multi-stage process of cancer development.

Methods: RNA-sequenced gene expression of Diffuse Large B-cell Lymphoma, Lung cancer, Liver cancer, Cervical cancer, and Testicular cancer were downloaded from the UCSC Xena database. Advanced-stage samples were normalized with early-stage samples to consider heterogeneity differences in the multi-stage cancer progression. WGCNA was used to build a gene network and categorized normalized genes into different modules. A gene set enrichment analysis selected key gene modules related to cancer. The diagnostic capacity of the modules was evaluated after hierarchical clustering.

Results: Unnormalized RNA-Seq gene expression failed to segregate advanced-stage samples based on selected cancer cohorts. Normalization with early-stage revealed the true heterogeneous gene expression that accumulates across the multi-stage cancer progression, this resulted in well segregated cancer samples. Cancer-specific pathways were enriched in the normalized WGCNA modules. The normalization method was further able to stratify patient samples based on phenotypic and clinical information. Additionally, the method allowed for patient survival analysis, with the Cox regression model selecting gene MAP4K1 in cervical cancer and Kaplan-Meier confirming that upregulation is favourable.

Conclusion: The application of the normalization method further enhanced the accuracy of clustering of cancer samples based on how they progressed. Additionally, genes responsible for cancer progression were discovered.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Female
  • Gene Expression
  • Gene Expression Profiling / methods
  • Humans
  • Male
  • Neoplastic Processes
  • RNA-Seq
  • Testicular Neoplasms*
  • Uterine Cervical Neoplasms*

Grants and funding

This work was supported by the South African Medical Research Council and National Research Foundation of South Africa grant: 121787 (M.L). https://www.samrc.ac.za & http://www.nrf.ac.za. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.