Summary: Typical RNA sequencing (RNA-Seq) analyses are performed either at the gene level by summing all reads from the same locus, assuming that all transcripts from a gene make a protein or at the transcript level, assuming that each transcript displays unique function. However, these assumptions are flawed, as a gene can code for different types of transcripts and different transcripts are capable of synthesizing similar, different or no protein. As a consequence, functional changes are not well illustrated by either gene or transcript analyses. We propose to improve RNA-Seq analyses by grouping the transcripts based on their similar functions. We developed FuSe to predict functional similarities using the primary and secondary structure of proteins. To estimate the likelihood of proteins with similar functions, FuSe computes two confidence scores: knowledge (KS) and discovery (DS) for protein pairs. Overlapping protein pairs exhibiting high confidence are grouped to form 'similar function protein groups' and expression is calculated for each functional group. The impact of using FuSe is demonstrated on in vitro cells exposed to paracetamol, which highlight genes responsible for cell adhesion and glycogen regulation which were earlier shown to be not differentially expressed with traditional analysis methods.
Availability and implementation: The source code is available at https://github.com/rajinder4489/FuSe. Data for APAP exposure are available in the BioStudies database (http://www.ebi.ac.uk/biostudies) under accession numbers S-HECA143, S-HECA(158) and S-HECA139.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.