Optimized Sequence Library Design for Efficient In Vitro Interaction Mapping

Cell Syst. 2017 Sep 27;5(3):230-236.e5. doi: 10.1016/j.cels.2017.07.006.

Abstract

Sequence libraries that cover all k-mers enable universal, unbiased measurements of binding to both oligonucleotides and peptides. While the number of k-mers grows exponentially in k, space on all experimental platforms is limited. Here, we shrink k-mer library sizes by using joker characters, which represent all characters in the alphabet simultaneously. We present the JokerCAKE (joker covering all k-mers) algorithm for generating a short sequence such that each k-mer appears at least p times with at most one joker character per k-mer. By running our algorithm on a range of parameters and alphabets, we show that JokerCAKE produces near-optimal sequences. Moreover, through comparison with data from hundreds of DNA-protein binding experiments and with new experimental results for both standard and JokerCAKE libraries, we establish that accurate binding scores can be inferred for high-affinity k-mers using JokerCAKE libraries. JokerCAKE libraries allow researchers to search a significantly larger sequence space using the same number of experimental measurements and at the same cost.

Keywords: de Bruijn graph; microarray design; sequence libraries.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Computational Biology / methods*
  • DNA-Binding Proteins
  • Gene Library
  • Oligonucleotides / chemical synthesis
  • Oligonucleotides / genetics
  • Protein Interaction Mapping / methods*
  • Sequence Analysis, DNA / methods*
  • Software

Substances

  • DNA-Binding Proteins
  • Oligonucleotides