A supervised learning method for classifying methylation disorders

BMC Bioinformatics. 2024 Feb 12;25(1):66. doi: 10.1186/s12859-024-05673-1.

Abstract

Background: DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age.

Results: Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader-Willi syndrome (PWS), Beckwith-Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver-Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868[Formula: see text]0.995).

Conclusion: We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.

Keywords: Angelman syndrome; Beckwith–Wiedemann syndrome; Congenital disease; Diagnosis; Machine learning; Methylation; Prader–Willi syndrome; Russell–Silver syndrome; Silver–Russell syndrome.

MeSH terms

  • Beckwith-Wiedemann Syndrome* / diagnosis
  • Beckwith-Wiedemann Syndrome* / genetics
  • DNA Methylation
  • Genomic Imprinting
  • Humans
  • Silver-Russell Syndrome* / diagnosis
  • Silver-Russell Syndrome* / genetics
  • Supervised Machine Learning