Determination of a Screening Metric for High Diversity DNA Libraries

PLoS One. 2016 Dec 8;11(12):e0167088. doi: 10.1371/journal.pone.0167088. eCollection 2016.

Abstract

The fields of antibody engineering, enzyme optimization and pathway construction rely increasingly on screening complex variant DNA libraries. These highly diverse libraries allow researchers to sample a maximized sequence space; and therefore, more rapidly identify proteins with significantly improved activity. The current state of the art in synthetic biology allows for libraries with billions of variants, pushing the limits of researchers' ability to qualify libraries for screening by measuring the traditional quality metrics of fidelity and diversity of variants. Instead, when screening variant libraries, researchers typically use a generic, and often insufficient, oversampling rate based on a common rule-of-thumb. We have developed methods to calculate a library-specific oversampling metric, based on fidelity, diversity, and representation of variants, which informs researchers, prior to screening the library, of the amount of oversampling required to ensure that the desired fraction of variant molecules will be sampled. To derive this oversampling metric, we developed a novel alignment tool to efficiently measure frequency counts of individual nucleotide variant positions using next-generation sequencing data. Next, we apply a method based on the "coupon collector" probability theory to construct a curve of upper bound estimates of the sampling size required for any desired variant coverage. The calculated oversampling metric will guide researchers to maximize their efficiency in using highly variant libraries.

MeSH terms

  • Gene Library*
  • Genetic Variation* / genetics
  • High-Throughput Nucleotide Sequencing
  • Humans
  • Models, Theoretical
  • Probability
  • Sequence Alignment

Grants and funding

Gen9 provided support in the form of salaries for authors NJG, SH, EMJ, DL, LAK, but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section.