Integrating Information in Biological Ontologies and Molecular Networks to Infer Novel Terms

Sci Rep. 2016 Dec 15:6:39237. doi: 10.1038/srep39237.

Abstract

Currently most terms and term-term relationships in Gene Ontology (GO) are defined manually, which creates cost, consistency and completeness issues. Recent studies have demonstrated the feasibility of inferring GO automatically from biological networks, which represents an important complementary approach to GO construction. These methods (NeXO and CliXO) are unsupervised, which means 1) they cannot use the information contained in existing GO, 2) the way they integrate biological networks may not optimize the accuracy, and 3) they are not customized to infer the three different sub-ontologies of GO. Here we present a semi-supervised method called Unicorn that extends these previous methods to tackle the three problems. Unicorn uses a sub-tree of an existing GO sub-ontology as training part to learn parameters in integrating multiple networks. Cross-validation results show that Unicorn reliably inferred the left-out parts of each specific GO sub-ontology. In addition, by training Unicorn with an old version of GO together with biological networks, it successfully re-discovered some terms and term-term relationships present only in a new version of GO. Unicorn also successfully inferred some novel terms that were not contained in GO but have biological meanings well-supported by the literature.

Availability: Source code of Unicorn is available at http://yiplab.cse.cuhk.edu.hk/unicorn/.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Biological Ontologies
  • Computational Biology / methods*
  • Gene Regulatory Networks
  • Metabolic Networks and Pathways