Integrative machine learning analysis of multiple gene expression profiles in cervical cancer

PeerJ. 2018 Jul 25:6:e5285. doi: 10.7717/peerj.5285. eCollection 2018.

Abstract

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).

Keywords: Cervical cancer prognosis; Feature selection; Gene expression profiling; Machine learning; Meta-analysis; Potential gene signature.

Grants and funding

This study was supported by the University of Malaya research grants with the project number of RP038C-15AET & BK041-2014. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.