Evaluation of pre-processing on the meta-analysis of DNA methylation data from the Illumina HumanMethylation450 BeadChip platform

Claudia Sala; Pietro Di Lena; Danielle Fernandes Durso; Andrea Prodi; Gastone Castellani; Christine Nardini

doi:10.1371/journal.pone.0229763

Evaluation of pre-processing on the meta-analysis of DNA methylation data from the Illumina HumanMethylation450 BeadChip platform

PLoS One. 2020 Mar 10;15(3):e0229763. doi: 10.1371/journal.pone.0229763. eCollection 2020.

Authors

Claudia Sala¹, Pietro Di Lena², Danielle Fernandes Durso³, Andrea Prodi⁴, Gastone Castellani^{5

6}, Christine Nardini^{7

8

9}

Affiliations

¹ Department of Physics and Astronomy, University of Bologna, Bologna, Italy.
² Department of Computer Science and Engineering, University of Bologna, Bologna, Italy.
³ Division of Infectious Diseases and Immunology, University of Massachusetts Medical School, Worcester, Massachusetts, United States of America.
⁴ Smart Cities Living Lab, Institute of Organic Synthesis and Photoreactivity, CNR, Bologna, Italy.
⁵ Department of Experimental, Diagnostic and Specialty Medicine, University of Bologna, Bologna, Italy.
⁶ Interdepartmental Center "L. Galvani", University of Bologna, Bologna, Italy.
⁷ Department of Laboratory Medicine, Karolinska Institutet, Stockholm, Sweden.
⁸ CNR IAC "Mauro Picone", Roma, Italy.
⁹ Sol Group, Monza, Italy.

Abstract

Introduction: Meta-analysis is a powerful means for leveraging the hundreds of experiments being run worldwide into more statistically powerful analyses. This is also true for the analysis of omic data, including genome-wide DNA methylation. In particular, thousands of DNA methylation profiles generated using the Illumina 450k are stored in the publicly accessible Gene Expression Omnibus (GEO) repository. Often, however, the intensity values produced by the BeadChip (raw data) are not deposited, therefore only pre-processed values -obtained after computational manipulation- are available. Pre-processing is possibly different among studies and may then affect meta-analysis by introducing non-biological sources of variability.

Material and methods: To systematically investigate the effect of pre-processing on meta-analysis, we analysed four different collections of DNA methylation samples (datasets), each composed of two subsets, for which raw data from controls (i.e. healthy subjects) and cases (i.e. patients) are available. We pre-processed the data from each dataset with nine among the most common pipelines found in literature. Moreover, we evaluated the performance of regRCPqn, a modification of the RCP algorithm that aims to improve data consistency. For each combination of pre-processing (9 × 9), we first evaluated the between-sample variability among control subjects and, then, we identified genomic positions that are differentially methylated between cases and controls (differential analysis).

Results and conclusion: The pre-processing of DNA methylation data affects both the between-sample variability and the loci identified as differentially methylated, and the effects of pre-processing are strongly dataset-dependent. By contrast, application of our renormalization algorithm regRCPqn: (i) reduces variability and (ii) increases agreement between meta-analysed datasets, both critical components of data harmonization.

Publication types

Evaluation Study
Research Support, Non-U.S. Gov't

MeSH terms

Animals
DNA Methylation*
High-Throughput Nucleotide Sequencing / methods
High-Throughput Nucleotide Sequencing / standards*
Humans
Meta-Analysis as Topic*
Sequence Analysis, DNA / methods
Sequence Analysis, DNA / standards*
Software / standards

Grants and funding

CS is funded by European Union’s Horizon 2020 research and innovation programme (H2020-MSCA-ITN grant agreement 721815 “IMforFUTURE”, https://imforfuture.eu/). The funder had no role in study design, data collection and analysis, decision to publish, or preparation of work included in this submission. The specific role of this author is articulated in the ‘author contributions’ section.