Protein domains provide a new layer of information for classifying human variations in rare diseases

Front Bioinform. 2023 Feb 21:3:1127341. doi: 10.3389/fbinf.2023.1127341. eCollection 2023.

Abstract

Introduction: Using the ACMG-AMP guidelines for the interpretation of sequence variants, it remains difficult to meet the criterion associated with the protein domain, PM1, which is assigned in only about 10% of cases, whereas the criteria related to variant frequency, PM2/BA1/BS1, is reported in 50% of cases. To improve the classification of human missense variants using protein domains information, we developed the DOLPHIN system (https://dolphin.mmg-gbit.eu). Methods: We used Pfam alignments of eukaryotes to define DOLPHIN scores to identify protein domain residues and variants that have a significant impact. In parallel, we enriched gnomAD variants frequencies for each domains' residue. These were validated using ClinVar data. Results: We applied this method to all potential human transcripts' variants, resulting in 30.0% being assigned a PM1 label, whereas 33.2% were eligible for a new benign support criterion, BP8. We also showed that DOLPHIN provides an extrapolated frequency for 31.8% of the variants, compared to the original frequency available in gnomAD for 7.6% of them. Discussion: Overall, DOLPHIN allows a simplified use of the PM1 criterion, an expanded application of the PM2/BS1 criteria and the creation of a new BP8 criterion. DOLPHIN could facilitate the classification of amino acid substitutions in protein domains that cover nearly 40% of proteins and represent the sites of most pathogenic variants.

Keywords: ACMG guidelines; BP8; BS1; PM1; PM2; protein domain; variant classification.

Grants and funding

COG was supported by a PhD grant from the MENESER (Ministère de l’Education Nationale, de l’Enseignement Supèrieur Et de la Recherche). This work has been supported by funding from the European Union’s Horizon 2020 research and innovation program under the European Joint Project for Rare Diseases (EJP-RD) COFUND-EJP N°825575; from the National Institute of Health and Medical Research (Institut National de la Santé et de la Recherche Médicale, INSERM) GOLD crosscutting program “Genomics variability in health and disease” program; and from Aix Marseille University. The Bioinformatics platform is affiliated to the French Institute of Bioinformatics. Funding for open access charge: (INSERM).