A Novel Method for Assessing the Statistical Significance of RNA-RNA Interactions Between Two Long RNAs

J Comput Biol. 2018 Sep;25(9):976-986. doi: 10.1089/cmb.2017.0260. Epub 2018 Jul 2.

Abstract

RNA-RNA interactions are key mechanisms through which noncoding RNA (ncRNA) regions exert biological functions. Computational prediction of RNA-RNA interactions is an essential method for detecting novel RNA-RNA interactions because their comprehensive detection by biological experimentation is still quite difficult. Many RNA-RNA interaction prediction tools have been developed, but they tend to produce many false positives. Accordingly, assessment of the statistical significance of computationally predicted interactions is an important task. However, there is no method to evaluate the statistical significance of RNA-RNA interactions that is applicable to interactions between two long RNA sequences. We developed a method to calculate the p-value for the minimal interaction energy between two long RNA sequences. The developed method depends on the fact that minimum interaction energies of RNA-RNA interactions between long RNAs follow a Gumbel distribution when repeat sequences in RNAs are masked. To show the usefulness of the developed method, we applied it to whole human 5'-untranslated region (UTR) and 3'-UTR sequences to detect novel 5'-UTR-3'-UTR interactions. We thus identified two significant 5'-UTR-3'-UTR interactions. Specifically, the human small proline-rich repeat protein 3 shows conserved 5'-UTR-3'-UTR interactions with some nucleotide variations preserving base pairings among primates. Our developed method enables us to detect statistically significant RNA-RNA interactions between long RNAs such as long ncRNAs. Statistical significance estimates help in identification of interactions for experimental validation and provide novel insights into the function of ncRNA regions.

Keywords: RNA bioinformatics; RNA-RNA interaction; statistical test.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • 3' Untranslated Regions*
  • 5' Untranslated Regions*
  • Algorithms*
  • Base Sequence
  • Cornified Envelope Proline-Rich Proteins / chemistry
  • Cornified Envelope Proline-Rich Proteins / metabolism*
  • Humans
  • RNA, Untranslated / chemistry
  • RNA, Untranslated / metabolism*
  • Sequence Homology

Substances

  • 3' Untranslated Regions
  • 5' Untranslated Regions
  • Cornified Envelope Proline-Rich Proteins
  • RNA, Untranslated
  • SPRR3 protein, human