An effective method to resolve ambiguous bisulfite-treated reads

BMC Bioinformatics. 2021 May 27;22(1):283. doi: 10.1186/s12859-021-04204-6.

Abstract

Background: The combination of the bisulfite treatment and the next-generation sequencing is an important method for methylation analysis, and aligning the bisulfite-treated reads (BS-reads) is the critical step for the downstream applications. As bisulfite treatment reduces the complexity of the sequences, a large portion of BS-reads might be aligned to multiple locations of the reference genome ambiguously, called multireads. These multireads cannot be employed in the downstream applications since they are likely to introduce artifacts. To identify the best mapping location of each multiread, existing Bayesian-based methods calculate the probability of the read at each position by considering how does it overlap with unique mapped reads. However, [Formula: see text]% of multireads are not overlapped with any unique reads, which are unresolvable for existing method.

Results: Here we propose a novel method (EM-MUL) that not only rescues multireads overlapped with unique reads, but also uses the overall coverage and accurate base-level alignment to resolve multireads that cannot be handled by current methods. We benchmark our method on both simulated datasets and real datasets. Experimental results show that it is able to align more than 80% of multireads to the best mapping position with very high accuracy.

Conclusions: EM-MUL is an effective method designed to accurately determine the best mapping position of multireads in BS-reads. For the downstream applications, it is useful to improve the methylation resolution on the repetitive regions of genome. EM-MUL is free available at https://github.com/lmylynn/EM-MUL.

Keywords: Bisulfite; DNA; Methylation; Multireads.

MeSH terms

  • Bayes Theorem
  • DNA Methylation
  • High-Throughput Nucleotide Sequencing
  • Sequence Analysis, DNA
  • Software*
  • Sulfites*

Substances

  • Sulfites
  • hydrogen sulfite