A model-based approach to identify binding sites in CLIP-Seq data

PLoS One. 2014 Apr 8;9(4):e93248. doi: 10.1371/journal.pone.0093248. eCollection 2014.

Abstract

Cross-linking immunoprecipitation coupled with high-throughput sequencing (CLIP-Seq) has made it possible to identify the targeting sites of RNA-binding proteins in various cell culture systems and tissue types on a genome-wide scale. Here we present a novel model-based approach (MiClip) to identify high-confidence protein-RNA binding sites from CLIP-seq datasets. This approach assigns a probability score for each potential binding site to help prioritize subsequent validation experiments. The MiClip algorithm has been tested in both HITS-CLIP and PAR-CLIP datasets. In the HITS-CLIP dataset, the signal/noise ratios of miRNA seed motif enrichment produced by the MiClip approach are between 17% and 301% higher than those by the ad hoc method for the top 10 most enriched miRNAs. In the PAR-CLIP dataset, the MiClip approach can identify ∼50% more validated binding targets than the original ad hoc method and two recently published methods. To facilitate the application of the algorithm, we have released an R package, MiClip (http://cran.r-project.org/web/packages/MiClip/index.html), and a public web-based graphical user interface software (http://galaxy.qbrc.org/tool_runner?tool_id=mi_clip) for customized analysis.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Binding Sites
  • High-Throughput Nucleotide Sequencing / methods*
  • MicroRNAs / chemistry
  • MicroRNAs / metabolism
  • Models, Biological
  • RNA / chemistry
  • RNA / metabolism*
  • RNA-Binding Proteins / metabolism*
  • Sequence Analysis, RNA / methods

Substances

  • MicroRNAs
  • RNA-Binding Proteins
  • RNA