Leveraging Bioinformatics and Machine Learning for Identifying Prognostic Biomarkers and Predicting Clinical Outcomes in Lung Adenocarcinoma

Kaida Cai; Wenzhi Fu; Hanwen Liu; Xiaofang Yang; Zhengyan Wang; Xin Zhao

doi:10.3390/genes15121497

Leveraging Bioinformatics and Machine Learning for Identifying Prognostic Biomarkers and Predicting Clinical Outcomes in Lung Adenocarcinoma

Genes (Basel). 2024 Nov 21;15(12):1497. doi: 10.3390/genes15121497.

Authors

Kaida Cai^{1

2

3}, Wenzhi Fu², Hanwen Liu², Xiaofang Yang², Zhengyan Wang², Xin Zhao^{2

4}

Affiliations

¹ Department of Epidemiology and Biostatistics, School of Public Health, Southeast University, Nanjing 210009, China.
² Department of Statistics and Actuarial Science, School of Mathematics, Southeast University, Nanjing 211189, China.
³ Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing 210009, China.
⁴ Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing 210096, China.

PMID: 39766765
DOI: 10.3390/genes15121497

Abstract

Background/Objectives: There exist significant challenges for lung adenocarcinoma (LUAD) due to its poor prognosis and limited treatment options, particularly in the advanced stages. It is crucial to identify genetic biomarkers for improving outcome predictions and guiding personalized therapies. Methods: In this study, we utilize a multi-step approach that combines principled sure independence screening, penalized regression methods and information gain to identify the key genetic features of the ultra-high dimensional RNA-sequencing data from LUAD patients. We then evaluate three methods of survival analysis: the Cox model, survival tree, and random survival forests (RSFs), to compare their predictive performance. Additionally, a protein-protein interaction network is used to explore the biological significance of identified genes. Results:DKK1 and TNS4 are consistently selected as significant predictors across all feature selection methods. The Kaplan-Meier method shows that high expression levels of these genes are strongly correlated with poorer survival outcomes, suggesting their potential as prognostic biomarkers. RSF outperforms Cox and survival tree methods, showing higher AUC and C-index values. The protein-protein interaction network highlights key nodes such as VEGFC and LAMA3, which play central roles in LUAD progression. Conclusions: Our findings provide valuable insights into the genetic mechanisms of LUAD. These results contribute to the development of more accurate prognostic tools and personalized treatment strategies for LUAD.

Keywords: RNA sequencing data; feature selection; lung adenocarcinoma; machine learning; prognostic biomarkers.

MeSH terms

Adenocarcinoma of Lung* / genetics
Adenocarcinoma of Lung* / mortality
Adenocarcinoma of Lung* / pathology
Biomarkers, Tumor* / genetics
Computational Biology* / methods
Female
Gene Expression Regulation, Neoplastic
Humans
Kaplan-Meier Estimate
Lung Neoplasms* / genetics
Lung Neoplasms* / mortality
Lung Neoplasms* / pathology
Machine Learning*
Male
Prognosis
Protein Interaction Maps* / genetics

Substances

Biomarkers, Tumor

Abstract

MeSH terms

Substances

Grants and funding