Predicting censored survival data based on the interactions between meta-dimensional omics data in breast cancer

J Biomed Inform. 2015 Aug:56:220-8. doi: 10.1016/j.jbi.2015.05.019. Epub 2015 Jun 3.

Abstract

Evaluation of survival models to predict cancer patient prognosis is one of the most important areas of emphasis in cancer research. A binary classification approach has difficulty directly predicting survival due to the characteristics of censored observations and the fact that the predictive power depends on the threshold used to set two classes. In contrast, the traditional Cox regression approach has some drawbacks in the sense that it does not allow for the identification of interactions between genomic features, which could have key roles associated with cancer prognosis. In addition, data integration is regarded as one of the important issues in improving the predictive power of survival models since cancer could be caused by multiple alterations through meta-dimensional genomic data including genome, epigenome, transcriptome, and proteome. Here we have proposed a new integrative framework designed to perform these three functions simultaneously: (1) predicting censored survival data; (2) integrating meta-dimensional omics data; (3) identifying interactions within/between meta-dimensional genomic features associated with survival. In order to predict censored survival time, martingale residuals were calculated as a new continuous outcome and a new fitness function used by the grammatical evolution neural network (GENN) based on mean absolute difference of martingale residuals was implemented. To test the utility of the proposed framework, a simulation study was conducted, followed by an analysis of meta-dimensional omics data including copy number, gene expression, DNA methylation, and protein expression data in breast cancer retrieved from The Cancer Genome Atlas (TCGA). On the basis of the results from breast cancer dataset, we were able to identify interactions not only within a single dimension of genomic data but also between meta-dimensional omics data that are associated with survival. Notably, the predictive power of our best meta-dimensional model was 73% which outperformed all of the other models conducted based on a single dimension of genomic data. Breast cancer is an extremely heterogeneous disease and the high levels of genomic diversity within/between breast tumors could affect the risk of therapeutic responses and disease progression. Thus, identifying interactions within/between meta-dimensional omics data associated with survival in breast cancer is expected to deliver direction for improved meta-dimensional prognostic biomarkers and therapeutic targets.

Keywords: Breast cancer; Data integration; Interaction between multi-omics data; Survival prediction; TCGA.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Breast Neoplasms / genetics
  • Breast Neoplasms / metabolism
  • Breast Neoplasms / mortality*
  • Computational Biology / methods
  • Computer Simulation
  • DNA Methylation
  • Data Collection*
  • Disease Progression
  • Epigenomics
  • Female
  • Gene Expression Profiling
  • Genome, Human
  • Genomics
  • Humans
  • Medical Informatics / methods*
  • Models, Statistical
  • Neural Networks, Computer
  • Prognosis
  • Proportional Hazards Models
  • Proteome
  • Software
  • Survival Analysis*
  • Transcriptome
  • Treatment Outcome

Substances

  • Proteome