Transcriptional activation domains (ADs) of gene activators have remained enigmatic for decades as short, extremely variable, and structurally disordered sequences. Using a rational design and high throughput in vivo experimentation, we determine the grammar rules and exceptions for the language of ADs. According to identified rules, billions of highly active ADs can be composed of balanced amounts of acidic/aromatic amino acids, with either mixed composition of aromatic residues, or using only one aromatic residue mixed with acidic residues. However, equally active sequences can be composed of only aliphatic leucine and aspartic acid residues. The much rarer LD exceptions have a higher ratio of hydrophobic/acidic balance and display a specific LDL(L/D)DLL motif. For aromatic/acidic Ads, the intermixing of proline residues in context of amphipathic α-helix structures significantly increases the AD activity. The identified grammar rules and exceptions are interpreted in application to the biochemistry of AD function and eukaryotic gene expression.
Keywords: Biochemistry; Bioinformatics; Biological sciences; Genetics; Natural sciences.
© 2024 The Author(s).