Subfamily hmms in functional genomics

Pac Symp Biocomput. 2005:322-33.

Abstract

The limitations of homology-based methods for prediction of protein molecular function are well known; differences in domain structure, gene duplication events and errors in existing database annotations complicate this process. In this paper we present a method to detect and model protein subfamilies, which can be used in high-throughput, genome-scale phylogenomic inference of protein function. We demonstrate the method on a set of nine PFAM families, and show that subfamily HMMs provide greater separation of homologs and non-homologs than is possible with a single HMM for each family. We also show that subfamily HMMs can be used for functional classification with a very low expected error rate. The BETE method for identifying functional subfamilies is illustrated on a set of serotonin receptors.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Animals
  • Bayes Theorem
  • Biological Evolution
  • Databases, Nucleic Acid
  • Databases, Protein
  • Enzymes / genetics
  • Gene Duplication
  • Genomics*
  • Markov Chains
  • Models, Genetic
  • Phylogeny
  • Proteins / chemistry
  • Proteins / genetics
  • Sequence Alignment

Substances

  • Enzymes
  • Proteins