Enrichment analysis on regulatory subspaces: A novel direction for the superior description of cellular responses to SARS-CoV-2

Comput Biol Med. 2022 Jul:146:105443. doi: 10.1016/j.compbiomed.2022.105443. Epub 2022 Apr 25.

Abstract

Statement: Enrichment analysis of cell transcriptional responses to SARS-CoV-2 infection from biclustering solutions yields broader coverage and superior enrichment of GO terms and KEGG pathways against alternative state-of-the-art machine learning solutions, thus aiding knowledge extraction.

Motivation and methods: The comprehensive understanding of the impacts of SARS-CoV-2 virus on infected cells is still incomplete. This work aims at comparing the role of state-of-the-art machine learning approaches in the study of cell regulatory processes affected and induced by the SARS-CoV-2 virus using transcriptomic data from both infectable cell lines available in public databases and in vivo samples. In particular, we assess the relevance of clustering, biclustering and predictive modeling methods for functional enrichment. Statistical principles to handle scarcity of observations, high data dimensionality, and complex gene interactions are further discussed. In particular, and without loos of generalization ability, the proposed methods are applied to study the differential regulatory response of lung cell lines to SARS-CoV-2 (α-variant) against RSV, IAV (H1N1), and HPIV3 viruses.

Results: Gathered results show that, although clustering and predictive algorithms aid classic stances to functional enrichment analysis, more recent pattern-based biclustering algorithms significantly improve the number and quality of enriched GO terms and KEGG pathways with controlled false positive risks. Additionally, a comparative analysis of these results is performed to identify potential pathophysiological characteristics of COVID-19. These are further compared to those identified by other authors for the same virus as well as related ones such as SARS-CoV-1. The findings are particularly relevant given the lack of other works utilizing more complex machine learning algorithms within this context.

Keywords: Biclustering; COVID-19; Computational biology; Discriminative regulatory patterns; Machine learning; SARS-CoV-2; Transcriptomics.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • COVID-19*
  • Cluster Analysis
  • Humans
  • Influenza A Virus, H1N1 Subtype*
  • Machine Learning
  • SARS-CoV-2