Background: RNA sequencing (RNA-seq) and microarrays are two transcriptomics techniques aimed at the quantification of transcribed genes and their isoforms. Here we compare the latest Affymetrix HTA 2.0 microarray with Illumina 2000 RNA-seq for the analysis of patient samples - normal lung epithelium tissue and squamous cell carcinoma lung tumours. Protein coding mRNAs and long non-coding RNAs (lncRNAs) were included in the study.
Results: Both platforms performed equally well for protein-coding RNAs, however the stochastic variability was higher for the sequencing data than for microarrays. This reduced the number of differentially expressed genes and genes with predictive potential for RNA-seq compared to microarray data. Analysis of this variability revealed a lack of reads for short and low abundant genes; lncRNAs, being shorter and less abundant RNAs, were found especially susceptible to this issue. A major difference between the two platforms was uncovered by analysis of alternatively spliced genes. Investigation of differential exon abundance showed insufficient reads for many exons and exon junctions in RNA-seq while the detection on the array platform was more stable. Nevertheless, we identified 207 genes which undergo alternative splicing and were consistently detected by both techniques.
Conclusions: Despite the fact that the results of gene expression analysis were highly consistent between Human Transcriptome Arrays and RNA-seq platforms, the analysis of alternative splicing produced discordant results. We concluded that modern microarrays can still outperform sequencing for standard analysis of gene expression in terms of reproducibility and cost.
Keywords: Differential exon usage; Differential expression analysis; Microarrays; RNA sequencing; Splicing.