Transposable elements, such as Long INterspersed Elements (LINEs), are DNA sequences that can replicate within genomes. LINEs replicate using an RNA intermediate followed by reverse transcription and are typically a few kilobases in length. LINE activity creates genomic structural variants in human populations and leads to somatic alterations in cancer genomes. Long-read RNA sequencing technologies, including Oxford Nanopore and PacBio, can directly sequence relatively long transcripts, thus providing the opportunity to examine full-length LINE transcripts. This study focuses on the development of a new bioinformatics pipeline for the identification and quantification of active, full-length LINE transcripts in diverse human tissues and cell lines. In our pipeline, we utilized RepeatMasker to identify LINE-1 (L1) transcripts from long-read transcriptome data and incorporated several criteria, such as transcript start position, divergence, and length, to remove likely false positives. Comparisons between cancerous and normal cell lines, as well as human tissue samples, revealed elevated expression levels of young LINEs in cancer, particularly at intact L1 loci. By employing bioinformatics methodologies on long-read transcriptome data, this study demonstrates the landscape of L1 expression in tissues and cell lines.
Keywords: LINE-1; Long INterspersed Elements; TE; gene expression; long-read sequencing; transposable elements.