Cue: a deep-learning framework for structural variant discovery and genotyping

Victoria Popic; Chris Rohlicek; Fabio Cunial; Iman Hajirasouliha; Dmitry Meleshko; Kiran Garimella; Anant Maheshwari

doi:10.1038/s41592-023-01799-x

Cue: a deep-learning framework for structural variant discovery and genotyping

Nat Methods. 2023 Apr;20(4):559-568. doi: 10.1038/s41592-023-01799-x. Epub 2023 Mar 23.

Authors

Victoria Popic¹, Chris Rohlicek², Fabio Cunial³, Iman Hajirasouliha^{4

5}, Dmitry Meleshko^{5

6}, Kiran Garimella³, Anant Maheshwari²

Affiliations

¹ Broad Institute of MIT and Harvard, Cambridge, MA, USA. vpopic@broadinstitute.org.
² Broad Institute of MIT and Harvard, Cambridge, MA, USA.
³ Data Sciences Platform, Broad Institute of MIT and Harvard, Cambridge, MA, USA.
⁴ Department of Physiology and Biophysics, Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA.
⁵ Englander Institute for Precision Medicine, The Meyer Cancer Center, Weill Cornell Medicine, New York, NY, USA.
⁶ Tri-Institutional Computational Biology and Medicine Program, Weill Cornell Medicine, New York, NY, USA.

Abstract

Structural variants (SVs) are a major driver of genetic diversity and disease in the human genome and their discovery is imperative to advances in precision medicine. Existing SV callers rely on hand-engineered features and heuristics to model SVs, which cannot scale to the vast diversity of SVs nor fully harness the information available in sequencing datasets. Here we propose an extensible deep-learning framework, Cue, to call and genotype SVs that can learn complex SV abstractions directly from the data. At a high level, Cue converts alignments to images that encode SV-informative signals and uses a stacked hourglass convolutional neural network to predict the type, genotype and genomic locus of the SVs captured in each image. We show that Cue outperforms the state of the art in the detection of several classes of SVs on synthetic and real short-read data and that it can be easily extended to other sequencing platforms, while achieving competitive performance.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Cues
Deep Learning*
Genome, Human
Genomic Structural Variation
Genotype
Humans
Software*

Abstract

Publication types

MeSH terms

Grants and funding