Predicting CRISPR/Cas9 Repair Outcomes by Attention-Based Deep Learning Framework

Cells. 2022 Jun 5;11(11):1847. doi: 10.3390/cells11111847.

Abstract

As a simple and programmable nuclease-based genome editing tool, the CRISPR/Cas9 system has been widely used in target-gene repair and gene-expression regulation. The DNA mutation generated by CRISPR/Cas9-mediated double-strand breaks determines its biological and phenotypic effects. Experiments have demonstrated that CRISPR/Cas9-generated cellular-repair outcomes depend on local sequence features. Therefore, the repair outcomes after DNA break can be predicted by sequences near the cleavage sites. However, existing prediction methods rely on manually constructed features or insufficiently detailed prediction labels. They cannot satisfy clinical-level-prediction accuracy, which limit the performance of these models to existing knowledge about CRISPR/Cas9 editing. We predict 557 repair labels of DNA, covering the vast majority of Cas9-generated mutational outcomes, and build a deep learning model called Apindel, to predict CRISPR/Cas9 editing outcomes. Apindel, automatically, trains the sequence features of DNA with the GloVe model, introduces location information through Positional Encoding (PE), and embeds the trained-word vector matrixes into a deep learning model, containing BiLSTM and the Attention mechanism. Apindel has better performance and more detailed prediction categories than the most advanced DNA-mutation-predicting models. It, also, reveals that nucleotides at different positions relative to the cleavage sites have different influences on CRISPR/Cas9 editing outcomes.

Keywords: DNA repair; attention mechanism; deep learning; positional encoding.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • CRISPR-Cas Systems* / genetics
  • Deep Learning*
  • Endonucleases / genetics
  • Gene Editing / methods
  • Mutation / genetics

Substances

  • Endonucleases

Associated data

  • figshare/10.6084/m9.figshare.7312067

Grants and funding

This work was supported by grants from the National Natural Science Foundation of China (61873027) and an open project of the National Engineering Laboratory for Agri-Product Quality Traceability (No.AQT-2020-YB6).