The full-length transcriptome of C. elegans using direct RNA sequencing

Nathan P Roach; Norah Sadowski; Amelia F Alessi; Winston Timp; James Taylor; John K Kim

doi:10.1101/gr.251314.119

The full-length transcriptome of C. elegans using direct RNA sequencing

Genome Res. 2020 Feb;30(2):299-312. doi: 10.1101/gr.251314.119. Epub 2020 Feb 5.

Authors

Nathan P Roach¹, Norah Sadowski², Amelia F Alessi¹, Winston Timp², James Taylor^{1

3}, John K Kim¹

Affiliations

¹ Department of Biology, Johns Hopkins University, Baltimore, Maryland 21218, USA.
² Department of Biomedical Engineering, Department of Molecular Biology and Genetics, Johns Hopkins University, Baltimore, Maryland 21218, USA.
³ Department of Computer Science, Johns Hopkins University, Baltimore, Maryland 21218, USA.

Abstract

Current transcriptome annotations have largely relied on short read lengths intrinsic to the most widely used high-throughput cDNA sequencing technologies. For example, in the annotation of the Caenorhabditis elegans transcriptome, more than half of the transcript isoforms lack full-length support and instead rely on inference from short reads that do not span the full length of the isoform. We applied nanopore-based direct RNA sequencing to characterize the developmental polyadenylated transcriptome of C. elegans Taking advantage of long reads spanning the full length of mRNA transcripts, we provide support for 23,865 splice isoforms across 14,611 genes, without the need for computational reconstruction of gene models. Of the isoforms identified, 3452 are novel splice isoforms not present in the WormBase WS265 annotation. Furthermore, we identified 16,342 isoforms in the 3' untranslated region (3' UTR), 2640 of which are novel and do not fall within 10 bp of existing 3'-UTR data sets and annotations. Combining 3' UTRs and splice isoforms, we identified 28,858 full-length transcript isoforms. We also determined that poly(A) tail lengths of transcripts vary across development, as do the strengths of previously reported correlations between poly(A) tail length and expression level, and poly(A) tail length and 3'-UTR length. Finally, we have formatted this data as a publicly accessible track hub, enabling researchers to explore this data set easily in a genome browser.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Alternative Splicing / genetics
Animals
Caenorhabditis elegans / genetics*
Caenorhabditis elegans / growth & development
Exons / genetics
Gene Expression Regulation, Developmental / genetics
Genome / genetics*
Molecular Sequence Annotation
RNA, Messenger / genetics*
Sequence Analysis, RNA
Transcriptome / genetics*

Substances

RNA, Messenger

Abstract

Publication types

MeSH terms

Substances

Grants and funding