Integrating multiomics and prior knowledge: a study of the Graphnet penalty impact

Bioinformatics. 2023 Aug 1;39(8):btad454. doi: 10.1093/bioinformatics/btad454.

Abstract

Motivation: In the field of oncology, statistical models are used for the discovery of candidate factors that influence the development of the pathology or its outcome. These statistical models can be designed in a multiblock framework to study the relationship between different multiomic data, and variable selection is often achieved by imposing constraints on the model parameters. A priori graph constraints have been used in the literature as a way to improve feature selection in the model, yielding more interpretability. However, it is still unclear how these graphs interact with the models and how they impact the feature selection. Additionally, with the availability of different graphs encoding different information, one can wonder how the choice of the graph meaningfully impacts the results obtained.

Results: We proposed to study the graph penalty impact on a multiblock model. Specifically, we used the SGCCA as the multiblock framework. We studied the effect of the penalty on the model using the TCGA-LGG dataset. Our findings are 3-fold. We showed that the graph penalty increases the number of selected genes from this dataset, while selecting genes already identified in other works as pertinent biomarkers in the pathology. We demonstrated that using different graphs leads to different though consistent results, but that graph density is the main factor influencing the obtained results. Finally, we showed that the graph penalty increases the performance of the survival prediction from the model-derived components and the interpretability of the results.

Availability and implementation: Source code is freely available at https://github.com/neurospin/netSGCCA.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Models, Statistical
  • Multiomics*
  • Software*