Tree polynomials identify a link between co-transcriptional R-loops and nascent RNA folding

PLoS Comput Biol. 2024 Dec 13;20(12):e1012669. doi: 10.1371/journal.pcbi.1012669. eCollection 2024 Dec.

Abstract

R-loops are a class of non-canonical nucleic acid structures that typically form during transcription when the nascent RNA hybridizes the DNA template strand, leaving the non-template DNA strand unpaired. These structures are abundant in nature and play important physiological and pathological roles. Recent research shows that DNA sequence and topology affect R-loops, yet it remains unclear how these and other factors contribute to R-loop formation. In this work, we investigate the link between nascent RNA folding and the formation of R-loops. We introduce tree-polynomials, a new class of representations of RNA secondary structures. A tree-polynomial representation consists of a rooted tree associated with an RNA secondary structure together with a polynomial that is uniquely identified with the rooted tree. Tree-polynomials enable accurate, interpretable and efficient data analysis of RNA secondary structures without pseudoknots. We develop a computational pipeline for investigating and predicting R-loop formation from a genomic sequence. The pipeline obtains nascent RNA secondary structures from a co-transcriptional RNA folding software, and computes the tree-polynomial representations of the structures. By applying this pipeline to plasmid sequences that contain R-loop forming genes, we establish a strong correlation between the coefficient sums of tree-polynomials and the experimental probability of R-loop formation. Such strong correlation indicates that the pipeline can be used for accurate R-loop prediction. Furthermore, the interpretability of tree-polynomials allows us to characterize the features of RNA secondary structure associated with R-loop formation. In particular, we identify that branches with short stems separated by bulges and interior loops are associated with R-loops.

MeSH terms

  • Algorithms
  • Computational Biology* / methods
  • Nucleic Acid Conformation*
  • R-Loop Structures* / genetics
  • RNA Folding* / genetics
  • RNA* / chemistry
  • RNA* / genetics
  • Software
  • Transcription, Genetic / genetics

Substances

  • RNA

Grants and funding

P.L., J.L. and M.V. were supported by the National Science Foundation (NSF - https://www.nsf.gov) grant DMS/NIGMS#2054347. N.J. was supported by NFS grant DMS/NIGMS#2054321. In addition, M.V. acknowledges support by NSF grant DMS#1817156, and N.J. acknowledges support in part by NSF grants CCF#2107267, the W.M. Keck Foundation (https://www.wmkeck.org), and the Center for Mathematics of Complex Biological Systems under NSF grant DMS#1764406 and Simons Foundation (https://www.simonsfoundation.org) grant #594594. The funders did not play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.