Comprehensive Analysis of Large-Scale Transcriptomes from Multiple Cancer Types

Genes (Basel). 2021 Nov 24;12(12):1865. doi: 10.3390/genes12121865.

Abstract

Various abnormalities of transcriptional regulation revealed by RNA sequencing (RNA-seq) have been reported in cancers. However, strategies to integrate multi-modal information from RNA-seq, which would help uncover more disease mechanisms, are still limited. Here, we present PipeOne, a cross-platform one-stop analysis workflow for large-scale transcriptome data. It was developed based on Nextflow, a reproducible workflow management system. PipeOne is composed of three modules, data processing and feature matrices construction, disease feature prioritization, and disease subtyping. It first integrates eight different tools to extract different information from RNA-seq data, and then used random forest algorithm to study and stratify patients according to evidences from multiple-modal information. Its application in five cancers (colon, liver, kidney, stomach, or thyroid; total samples n = 2024) identified various dysregulated key features (such as PVT1 expression and ABI3BP alternative splicing) and pathways (especially liver and kidney dysfunction) shared by multiple cancers. Furthermore, we demonstrated clinically-relevant patient subtypes in four of five cancers, with most subtypes characterized by distinct driver somatic mutations, such as TP53, TTN, BRAF, HRAS, MET, KMT2D, and KMT2C mutations. Importantly, these subtyping results were frequently contributed by dysregulated biological processes, such as ribosome biogenesis, RNA binding, and mitochondria functions. PipeOne is efficient and accurate in studying different cancer types to reveal the specificity and cross-cancer contributing factors of each cancer.It could be easily applied to other diseases and is available at GitHub.

Keywords: RNA-seq workflow; TCGA; alternative splicing; cancer subtyping; feature prioritization; mitochondria; ribosome; somatic mutation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alternative Splicing / genetics
  • Computational Biology / methods
  • Gene Expression Profiling / methods
  • Genome, Human / genetics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Mitochondria / genetics
  • Mutation / genetics
  • Neoplasms / genetics*
  • RNA / genetics
  • Ribosomes / genetics
  • Sequence Analysis, RNA / methods
  • Signal Transduction / genetics
  • Transcriptome / genetics*

Substances

  • RNA