RNA sequencing is a powerful technology that allows for unbiased profiling of the entire transcriptome. The analysis of transcriptome profiles from heterogeneous tissues, cell admixtures with relative proportions that can vary several fold across samples, poses a significant challenge. Blood is perhaps the most egregious example. Here, we describe in detail a computational pipeline for RNA-Seq data preparation and statistical analysis, with development of a means of estimating the cell type composition of blood samples from their bulk RNA-Seq profiles. We also illustrate the importance of adjusting for the potential confounding effect of cellular heterogeneity in the context of statistical inference in a whole blood RNA-Seq dataset.
Keywords: Cell type-specific deconvolution; Cellular heterogeneity; RNA-Seq; Transcriptomics; Whole blood.