Comparative genomics and evolution of proteins associated with RNA polymerase II C-terminal domain

Mol Biol Evol. 2005 Nov;22(11):2166-78. doi: 10.1093/molbev/msi215. Epub 2005 Jul 13.

Abstract

The C-terminal domain (CTD) of the largest subunit of RNA polymerase II provides an anchoring point for a wide variety of proteins involved in mRNA synthesis and processing. Most of what is known about CTD-protein interactions comes from animal and yeast models. The consensus sequence and repetitive structure of the CTD is conserved strongly across a wide range of organisms, implying that the same is true of many of its known functions. In some eukaryotic groups, however, the CTD has been allowed to degenerate, suggesting a comparable lack of essential protein interactions. To date, there has been no comprehensive examination of CTD-related proteins across the eukaryotic domain to determine which of its identified functions are correlated with strong stabilizing selection on CTD structure. Here we report a comparative investigation of genes encoding 50 CTD-associated proteins, identifying putative homologs from 12 completed or nearly completed eukaryotic genomes. The presence of a canonical CTD generally is correlated with the apparent presence and conservation of its known protein partners; however, no clear set of interactions emerges that is invariably linked to conservation of the CTD. General rates of evolution, phylogenetic patterns, and the conservation of modeled tertiary structure of capping enzyme guanylyltransferase (Cgt1) indicate a pattern of coevolution of components of a transcription factory organized around the CTD, presumably driven by common functional constraints. These constraints complicate efforts to determine orthologous gene relationships and can mislead phylogenetic and informatic algorithms.

Publication types

  • Comparative Study
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Amino Acid Sequence
  • Base Sequence
  • Computational Biology
  • Genomics / methods
  • Models, Molecular*
  • Molecular Sequence Data
  • Nucleotidyltransferases / genetics
  • Phylogeny*
  • Protein Structure, Tertiary / genetics
  • Proteins / genetics*
  • Proteins / metabolism*
  • RNA Polymerase II / genetics*
  • RNA Polymerase II / metabolism*
  • mRNA Guanylyltransferases

Substances

  • Proteins
  • Nucleotidyltransferases
  • RNA Polymerase II
  • mRNA Guanylyltransferases