Optimization of C-to-G base editors with sequence context preference predictable by machine learning methods

Nat Commun. 2021 Aug 12;12(1):4902. doi: 10.1038/s41467-021-25217-y.

Abstract

Efficient and precise base editors (BEs) for C-to-G transversion are highly desirable. However, the sequence context affecting editing outcome largely remains unclear. Here we report engineered C-to-G BEs of high efficiency and fidelity, with the sequence context predictable via machine-learning methods. By changing the species origin and relative position of uracil-DNA glycosylase and deaminase, together with codon optimization, we obtain optimized C-to-G BEs (OPTI-CGBEs) for efficient C-to-G transversion. The motif preference of OPTI-CGBEs for editing 100 endogenous sites is determined in HEK293T cells. Using a sgRNA library comprising 41,388 sequences, we develop a deep-learning model that accurately predicts the OPTI-CGBE editing outcome for targeted sites with specific sequence context. These OPTI-CGBEs are further shown to be capable of efficient base editing in mouse embryos for generating Tyr-edited offspring. Thus, these engineered CGBEs are useful for efficient and precise base editing, with outcome predictable based on sequence context of targeted sites.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Base Sequence
  • Binding Sites / genetics
  • CRISPR-Cas Systems*
  • Caenorhabditis elegans / genetics
  • Codon / genetics
  • Cytidine Deaminase / genetics
  • Cytidine Deaminase / metabolism*
  • Escherichia coli / genetics
  • Female
  • Gene Editing / methods*
  • Gene Library
  • HEK293 Cells
  • Humans
  • Machine Learning*
  • Mice
  • Reproducibility of Results
  • Uracil-DNA Glycosidase / genetics
  • Uracil-DNA Glycosidase / metabolism*

Substances

  • Codon
  • Uracil-DNA Glycosidase
  • Cytidine Deaminase