Assessing Computational Steps for CLIP-Seq Data Analysis

Biomed Res Int. 2015:2015:196082. doi: 10.1155/2015/196082. Epub 2015 Oct 11.

Abstract

RNA-binding protein (RBP) is a key player in regulating gene expression at the posttranscriptional level. CLIP-Seq, with the ability to provide a genome-wide map of protein-RNA interactions, has been increasingly used to decipher RBP-mediated posttranscriptional regulation. Generating highly reliable binding sites from CLIP-Seq requires not only stringent library preparation but also considerable computational efforts. Here we presented a first systematic evaluation of major computational steps for identifying RBP binding sites from CLIP-Seq data, including preprocessing, the choice of control samples, peak normalization, and motif discovery. We found that avoiding PCR amplification artifacts, normalizing to input RNA or mRNAseq, and defining the background model from control samples can reduce the bias introduced by RNA abundance and improve the quality of detected binding sites. Our findings can serve as a general guideline for CLIP experiments design and the comprehensive analysis of CLIP-Seq data.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Binding Sites / genetics
  • Caco-2 Cells
  • Gene Expression Regulation
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • MicroRNAs / genetics
  • RNA, Messenger / genetics*
  • RNA, Messenger / metabolism
  • RNA-Binding Proteins / genetics*
  • RNA-Binding Proteins / metabolism
  • Sequence Analysis, RNA

Substances

  • MicroRNAs
  • RNA, Messenger
  • RNA-Binding Proteins