A breast cancer subtype classification scheme, PAM50, based on genetic information is widely accepted for clinical applications. On the other hands, experimental cancer biology studies have been successful in revealing the mechanisms of breast cancer and now the hallmarks of cancer have been determined to explain the core mechanisms of tumorigenesis. Thus, it is important to understand how the breast cancer subtypes are related to the cancer core mechanisms, but multiple studies are yet to address the hallmarks of breast cancer subtypes. Therefore, a new approach that can explain the differences among breast cancer subtypes in terms of cancer hallmarks is needed. We developed an information theoretic sub-network mining algorithm, differentially expressed sub-network and pathway analysis (DeSPA), that retrieves tumor-related genes by mining a gene regulatory network (GRN) of transcription factors and miRNAs. With extensive experiments of the cancer genome atlas (TCGA) breast cancer sequencing data, we showed that our approach was able to select genes that belong to cancer core pathways such as DNA replication, cell cycle, p53 pathways while keeping the accuracy of breast cancer subtype classification comparable to that of PAM50. In addition, our method produces a regulatory network of TF, miRNA, and their target genes that distinguish breast cancer subtypes, which is confirmed by experimental studies in the literature.
Keywords: Breast cancer subtype; DNA replication; cancer core mechanisms; cell cycle; information theory; regulatory network; sub-network mining.