Evaluation of an automated genome interpretation model for rare disease routinely used in a clinical genetic laboratory

Linyan Meng; Ruben Attali; Tomer Talmy; Yakir Regev; Niv Mizrahi; Pola Smirin-Yosef; Liesbeth Vossaert; Christian Taborda; Michael Santana; Ido Machol; Rui Xiao; Hongzheng Dai; Christine Eng; Fan Xia; Shay Tzur

doi:10.1016/j.gim.2023.100830

Evaluation of an automated genome interpretation model for rare disease routinely used in a clinical genetic laboratory

Genet Med. 2023 Jun;25(6):100830. doi: 10.1016/j.gim.2023.100830. Epub 2023 Mar 16.

Authors

Affiliations

¹ Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX; Baylor Genetics, Houston, TX.
² Genomic Research Department, Emedgene, an Illumina Company, Tel Aviv, Israel.
³ Genomic Research Department, Emedgene, an Illumina Company, Tel Aviv, Israel; Institute of Research in Military Medicine, The Faculty of Medicine, The Hebrew University of Jerusalem, Hadassah Medical Center, Jerusalem, Israel.
⁴ Baylor Genetics, Houston, TX.
⁵ Genomic Research Department, Emedgene, an Illumina Company, Tel Aviv, Israel. Electronic address: stzur@illumina.com.

PMID: 36939041
DOI: 10.1016/j.gim.2023.100830

Abstract

Purpose: The analysis of exome and genome sequencing data for the diagnosis of rare diseases is challenging and time-consuming. In this study, we evaluated an artificial intelligence model, based on machine learning for automating variant prioritization for diagnosing rare genetic diseases in the Baylor Genetics clinical laboratory.

Methods: The automated analysis model was developed using a supervised learning approach based on thousands of manually curated variants. The model was evaluated on 2 cohorts. The model accuracy was determined using a retrospective cohort comprising 180 randomly selected exome cases (57 singletons, 123 trios); all of which were previously diagnosed and solved through manual interpretation. Diagnostic yield with the modified workflow was estimated using a prospective "production" cohort of 334 consecutive clinical cases.

Results: The model accurately pinpointed all manually reported variants as candidates. The reported variants were ranked in top 10 candidate variants in 98.4% (121/123) of trio cases, in 93.0% (53/57) of single proband cases, and 96.7% (174/180) of all cases. The accuracy of the model was reduced in some cases because of incomplete variant calling (eg, copy number variants) or incomplete phenotypic description.

Conclusion: The automated model for case analysis assists clinical genetic laboratories in prioritizing candidate variants effectively. The use of such technology may facilitate the interpretation of genomic data for a large number of patients in the era of precision medicine.

Keywords: Clinical genomics; Machine learning methods.

MeSH terms

Artificial Intelligence
Exome / genetics
Humans
Laboratories
Laboratories, Clinical*
Prospective Studies
Rare Diseases* / diagnosis
Rare Diseases* / genetics
Retrospective Studies