Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding

Daniel D Le; Tyler C Shimko; Arjun K Aditham; Allison M Keys; Scott A Longwell; Yaron Orenstein; Polly M Fordyce

doi:10.1073/pnas.1715888115

Comprehensive, high-resolution binding energy landscapes reveal context dependencies of transcription factor binding

Proc Natl Acad Sci U S A. 2018 Apr 17;115(16):E3702-E3711. doi: 10.1073/pnas.1715888115. Epub 2018 Mar 27.

Authors

Daniel D Le¹, Tyler C Shimko¹, Arjun K Aditham^{2

3}, Allison M Keys^{3

4}, Scott A Longwell², Yaron Orenstein⁵, Polly M Fordyce^{6

2

3

7}

Affiliations

¹ Department of Genetics, Stanford University, Stanford, CA 94305.
² Department of Bioengineering, Stanford University, Stanford, CA 94305.
³ Stanford ChEM-H (Chemistry, Engineering, and Medicine for Human Health), Stanford University, Stanford, CA 94305.
⁴ Department of Chemistry, Stanford University, Stanford, CA 94305.
⁵ Department of Electrical and Computer Engineering, Ben-Gurion University of the Negev, Beer-Sheva, Israel POB 653.
⁶ Department of Genetics, Stanford University, Stanford, CA 94305; pfordyce@stanford.edu.
⁷ Chan Zuckerberg Biohub, San Francisco, CA 94158.

Abstract

Transcription factors (TFs) are primary regulators of gene expression in cells, where they bind specific genomic target sites to control transcription. Quantitative measurements of TF-DNA binding energies can improve the accuracy of predictions of TF occupancy and downstream gene expression in vivo and shed light on how transcriptional networks are rewired throughout evolution. Here, we present a sequencing-based TF binding assay and analysis pipeline (BET-seq, for Binding Energy Topography by sequencing) capable of providing quantitative estimates of binding energies for more than one million DNA sequences in parallel at high energetic resolution. Using this platform, we measured the binding energies associated with all possible combinations of 10 nucleotides flanking the known consensus DNA target interacting with two model yeast TFs, Pho4 and Cbf1. A large fraction of these flanking mutations change overall binding energies by an amount equal to or greater than consensus site mutations, suggesting that current definitions of TF binding sites may be too restrictive. By systematically comparing estimates of binding energies output by deep neural networks (NNs) and biophysical models trained on these data, we establish that dinucleotide (DN) specificities are sufficient to explain essentially all variance in observed binding behavior, with Cbf1 binding exhibiting significantly more nonadditivity than Pho4. NN-derived binding energies agree with orthogonal biochemical measurements and reveal that dynamically occupied sites in vivo are both energetically and mutationally distant from the highest affinity sites.

Keywords: microfluidics; protein–DNA binding; transcription factor binding; transcription factor specificity; transcriptional regulation.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Base Sequence
Basic Helix-Loop-Helix Leucine Zipper Transcription Factors / metabolism
Binding Sites
Computer Simulation
DNA / metabolism*
DNA-Binding Proteins / metabolism
E-Box Elements
Gene Library
High-Throughput Nucleotide Sequencing / methods*
Microfluidic Analytical Techniques
Monte Carlo Method
Protein Binding
Saccharomyces cerevisiae Proteins / metabolism
Sequence Analysis, DNA
Thermodynamics
Transcription Factors / metabolism*
Transcription, Genetic

Substances

Basic Helix-Loop-Helix Leucine Zipper Transcription Factors
CBF1 protein, S cerevisiae
DNA-Binding Proteins
PHO4 protein, S cerevisiae
Saccharomyces cerevisiae Proteins
Transcription Factors
DNA

Associated data

figshare/10.6084/m9.figshare.5728467.v1

Grants and funding

R00 GM099848/GM/NIGMS NIH HHS/United States