Background: Neuroblastoma (NB) is the second most common pediatric solid tumor. Because the number of genetic mutations found in tumors are small, even in some patients with unfavorable NB, epigenetic variation is expected to play an important role in NB progression. DNA methylation is a major epigenetic mechanism, and its relationship with NB prognosis has been a concern. One limitation with the analysis of variation in DNA methylation is the lack of a suitable analytical model. Therefore, in this study, we performed a random forest (RF) analysis of the DNA methylome data of NB from multiple databases.
Results: RF is a popular machine learning model owing to its simplicity, intuitiveness, and computational cost. RF analysis identified novel intermediate-risk patient groups with characteristic DNA methylation patterns within the low-risk group. Feature selection analysis based on probe annotation revealed that enhancer-annotated regions had strong predictive power, particularly for MYCN-amplified NBs. We developed a gene-based analytical model to identify candidate genes related to disease progression, such as PRDM8 and FAM13A-AS1. RF analysis revealed sufficient predictive power compared to other machine learning models.
Conclusions: RF is a useful tool for DNA methylome analysis in cancer epigenetic studies, and has potential to identify a novel cancer-related genes.
Keywords: Comparative epigenomics; DNA methylation; Machine learning; Neuroblastoma.
© 2022. The Author(s).