A novel insight into Gene Ontology semantic similarity

Genomics. 2013 Jun;101(6):368-75. doi: 10.1016/j.ygeno.2013.04.010. Epub 2013 Apr 26.

Abstract

Existing methods for computing the semantic similarity between Gene Ontology (GO) terms are often based on external datasets and, therefore are not intrinsic to GO. Furthermore, they not only fail to handle identical annotations but also show a strong bias toward well-annotated proteins when being used for measuring similarity of proteins. Inspired by the concept of cellular differentiation and dedifferentiation in developmental biology, we propose a shortest semantic differentiation distance (SSDD) based on the concept of semantic totipotency to measure the semantic similarity of GO terms and further compare the functional similarity of proteins. Using human ratings and a benchmark dataset, SSDD was found to improve upon existing methods for computing the semantic similarity of GO terms. An in-depth analysis shows that SSDD is able to distinguish identical annotations and does not depend on annotation richness, thus producing more unbiased and reliable results. Online services can be accessed at the Gene Functional Similarity Analysis Tools website (GFSAT: http://nclab.hit.edu.cn/GFSAT).

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Genes*
  • Genomics / methods
  • Molecular Sequence Annotation*
  • Semantics
  • Sequence Analysis, DNA
  • Software*
  • Vocabulary, Controlled*