Background: Whole-genome bisulfite sequencing (WGBS) has become the standard method for interrogating plant methylomes at base resolution. However, deep WGBS measurements remain cost prohibitive for large, complex genomes and for population-level studies. As a result, most published plant methylomes are sequenced far below saturation, with a large proportion of cytosines having either missing data or insufficient coverage.
Results: Here we present METHimpute, a Hidden Markov Model (HMM) based imputation algorithm for the analysis of WGBS data. Unlike existing methods, METHimpute enables the construction of complete methylomes by inferring the methylation status and level of all cytosines in the genome regardless of coverage. Application of METHimpute to maize, rice and Arabidopsis shows that the algorithm infers cytosine-resolution methylomes with high accuracy from data as low as 6X, compared to data with 60X, thus making it a cost-effective solution for large-scale studies.
Conclusions: METHimpute provides methylation status calls and levels for all cytosines in the genome regardless of coverage, thus yielding complete methylomes even with low-coverage WGBS datasets. The method has been extensively tested in plants, but should also be applicable to other species. An implementation is available on Bioconductor.
Keywords: Hidden Markov Model; Imputation; Methylation; Whole-genome bisulfite sequencing.