Explore key genes of Crohn's disease based on glycerophospholipid metabolism: A comprehensive analysis Utilizing Mendelian Randomization, Multi-Omics integration, Machine Learning, and SHAP methodology

Int Immunopharmacol. 2024 Nov 15:141:112905. doi: 10.1016/j.intimp.2024.112905. Epub 2024 Aug 21.

Abstract

Background and aims: Crohn's disease (CD) is a chronic, complex inflammatory condition with increasing incidence and prevalence worldwide. However, the causes of CD remain incompletely understood. We identified CD-related metabolites, inflammatory factors, and key genes by Mendelian randomization (MR), multi-omics integration, machine learning (ML), and SHAP.

Methods: We first performed a mediation MR analysis on 1400 serum metabolites, 91 inflammatory factors, and CD. We found that certain phospholipids are causally related to CD. In the scRNA-seq data, monocytes were categorized into high and low metabolism groups based on their glycerophospholipid metabolism scores. The differentially expressed genes of these two groups of cells were extracted, and transcription factor prediction, cell communication analysis, and GSEA analysis were performed. After further screening of differentially expressed genes (FDR<0.05, log2FC>1), least absolute shrinkage and selection operator (LASSO) regression was performed to obtain hub genes. Models for hub genes were built using the Catboost, XGboost, and NGboost methods. Further, we used the SHAP method to interpret the models and obtain the gene with the highest contribution to each model. Finally, qRT-PCR was used to verify the expression of these genes in the peripheral blood mononuclear cells (PBMC) of CD patients and healthy subjects.

Result: MR results showed 1-palmitoyl-2-stearoyl-gpc (16:0/18:0) levels, 1-stearoyl-2-arachidonoyl-GPI (18:0/20:4) levels, 1-arachidonoyl-gpc (20:4n6) levels, 1-palmitoyl-2-arachidonoyl-gpc (16:0/20:4n6) levels, and 1-arachidonoyl-GPE (20:4n6) levels were significantly associated with CD risk reduction (FDR<0.05), with CXCL9 acting as a mediation between these phospholipids and CD. The analysis identified 19 hub genes, with Catboost, XGboost, and NGboost achieving AUC of 0.91, 0.88, and 0.85, respectively. The SHAP methodology obtained the three genes with the highest model contribution: G0S2, S100A8, and PLAUR. The qRT-PCR results showed that the expression levels of S100A8 (p = 0.0003), G0S2 (p < 0.0001), and PLAUR (p = 0.0141) in the PBMC of CD patients were higher than healthy subjects.

Conclusion: MR findings suggest that certain phospholipids may lower CD risk. G0S2, S100A8, and PLAUR may be potential pathogenic genes in CD. These phospholipids and genes could serve as novel diagnostic and therapeutic targets for CD.

Keywords: Crohn’s disease; Machine Learning; Mendelian Randomization; Multi-Omics; SHAP.

MeSH terms

  • Crohn Disease* / genetics
  • Genetic Predisposition to Disease
  • Glycerophospholipids* / blood
  • Glycerophospholipids* / metabolism
  • Humans
  • Machine Learning*
  • Mendelian Randomization Analysis*
  • Multiomics

Substances

  • Glycerophospholipids