Correlation between sequence conservation and the genomic context after gene duplication

Nucleic Acids Res. 2005 Oct 27;33(19):6164-71. doi: 10.1093/nar/gki913. Print 2005.

Abstract

A key complication in comparative genomics for reliable gene function prediction is the existence of duplicated genes. To study the effect of gene duplication on function prediction, we analyze orthologs between pairs of genomes where in one genome the orthologous gene has duplicated after the speciation of the two genomes (i.e. inparalogs). For these duplicated genes we investigate whether the gene that is most similar on the sequence level is also the gene that has retained the ancestral gene-neighborhood. Although the majority of investigated cases show a consistent pattern between sequence similarity and gene-neighborhood conservation, a substantial fraction, 29-38%, is inconsistent. The observation of inconsistency is not the result of a chance outcome owing to a lack of divergence time between inparalogs, but rather it seems to be the result of a chance outcome caused by very similar rates of sequence evolution of both inparalogs relative to their ortholog. If one-to-one orthologous relationships are required, it is advisable to combine contextual information (i.e. gene-neighborhood in prokaryotes and co-expression in eukaryotes) with protein sequence information to predict the most probable functional equivalent ortholog in the presence of inparalogs.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Artifacts
  • Bacterial Proteins / genetics
  • Base Sequence
  • Conserved Sequence
  • Evolution, Molecular*
  • Gene Duplication*
  • Genes, Bacterial
  • Genome, Bacterial
  • Genomics / methods*
  • Multigene Family
  • Operon

Substances

  • Bacterial Proteins
  • rfb protein, Bacteria