An integrated analysis of DNA, RNA and protein, so called proteogenomic studies, has the potential to greatly increase our understanding of both normal physiology and disease development. However, such studies are challenged by a lack of a systematic approach to credential individual samples resulting in the introduction of noise into the system that limits the ability to identify important biological signals. Indeed, a recent proteogenomic CPTAC study identified 26% of samples as unsatisfactory, resulting in a marked increase in cost and loss of information content. Based on a large-scale analysis of RNA-seq and proteomic data generated by reverse phase protein arrays (RPPA) and by mass spectrometry, we propose a protein-mRNA correlation-based (PMC) score as a robust metric to credential single samples for integrated proteogenomic studies. Samples with high PMC scores have significantly higher protein-mRNA correlation, total protein content and tumor purity. Our results highlight the importance of credentialing individual samples prior to proteogenomic analysis.
Keywords: Bioinformatics; Data evaluation; Mass Spectrometry; Protein array; Proteogenomics; reverse-phase protein array.
© 2018 Zhao et al.