Several studies have compared the transcriptome across various brain regions in Huntington's disease (HD) gene-positive and neurologically normal individuals to identify potential differentially expressed genes (DEGs) that could be pharmaceutical or prognostic targets for HD. Despite adhering to technical recommendations for optimal RNA-Seq analysis, none of the genes identified as upregulated in these studies have yet demonstrated success as prognostic or therapeutic targets for HD. Earlier studies included samples from neurologically normal individuals older than the HD gene-positive group. Considering the gradual transcriptional changes induced by aging in the brain, we posited that utilizing samples from older controls could result in the misidentification of DEGs. To validate our hypothesis, we reanalyzed 146 samples from this study, accessible on the SRA database, and employed Propensity Score Matching (PSM) to create a "virtual" control group with a statistically comparable age distribution to the HD gene-positive group. Our study underscores the adverse impact of using neurologically normal individuals over 75 as controls in gene differential expression analysis, resulting in false positives and negatives. We conclusively demonstrate that using such old controls leads to the misidentification of DEGs, detrimentally affecting the discovery of potential pharmaceutical and prognostic markers. This underscores the pivotal role of considering the age of control samples in RNA-Seq analysis and emphasizes its inclusion in evaluating best practices for such investigations. Although our primary focus is HD, our findings suggest that judiciously selecting age-appropriate control samples can significantly improve best practices in differential expression analysis.
Keywords: Huntington’s disease; PSM; RNA-seq analysis; aging; bioinformatics; case-control.
Copyright © 2024 Dias Pinto, Faustinoni Neto, Sanches Fernandes, Kerkis and Araldi.