High-throughput SELEX SAGE method for quantitative modeling of transcription-factor binding sites

Nat Biotechnol. 2002 Aug;20(8):831-5. doi: 10.1038/nbt718. Epub 2002 Jul 8.

Abstract

The ability to determine the location and relative strength of all transcription-factor binding sites in a genome is important both for a comprehensive understanding of gene regulation and for effective promoter engineering in biotechnological applications. Here we present a bioinformatically driven experimental method to accurately define the DNA-binding sequence specificity of transcription factors. A generalized profile was used as a predictive quantitative model for binding sites, and its parameters were estimated from in vitro-selected ligands using standard hidden Markov model training algorithms. Computer simulations showed that several thousand low- to medium-affinity sequences are required to generate a profile of desired accuracy. To produce data on this scale, we applied high-throughput genomics methods to the biochemical problem addressed here. A method combining systematic evolution of ligands by exponential enrichment (SELEX) and serial analysis of gene expression (SAGE) protocols was coupled to an automated quality-controlled sequence extraction procedure based on Phred quality scores. This allowed the sequencing of a database of more than 10,000 potential DNA ligands for the CTF/NFI transcription factor. The resulting binding-site model defines the sequence specificity of this protein with a high degree of accuracy not achieved earlier and thereby makes it possible to identify previously unknown regulatory sequences in genomic DNA. A covariance analysis of the selected sites revealed non-independent base preferences at different nucleotide positions, providing insight into the binding mechanism.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Binding Sites
  • CCAAT-Enhancer-Binding Proteins / metabolism
  • Computational Biology / methods*
  • Computer Simulation
  • Consensus Sequence / genetics
  • DNA / genetics
  • DNA / metabolism
  • DNA-Binding Proteins / metabolism
  • Gene Expression Regulation
  • Genome
  • Genomics / methods*
  • Ligands
  • Models, Biological*
  • NFI Transcription Factors
  • Protein Binding
  • Response Elements / genetics*
  • Substrate Specificity
  • Transcription Factors / metabolism*

Substances

  • CCAAT-Enhancer-Binding Proteins
  • CTF-1 transcription factor
  • DNA-Binding Proteins
  • Ligands
  • NFI Transcription Factors
  • Transcription Factors
  • DNA