DAISM-DNNXMBD: Highly accurate cell type proportion estimation with in silico data augmentation and deep neural networks

Patterns (N Y). 2022 Feb 3;3(3):100440. doi: 10.1016/j.patter.2022.100440. eCollection 2022 Mar 11.

Abstract

Understanding the immune cell abundance of cancer and other disease-related tissues has an important role in guiding disease treatments. Computational cell type proportion estimation methods have been previously developed to derive such information from bulk RNA sequencing data. Unfortunately, our results show that the performance of these methods can be seriously plagued by the mismatch between training data and real-world data. To tackle this issue, we propose the DAISM-DNNXMBD (XMBD: Xiamen Big Data, a biomedical open software initiative in the National Institute for Data Science in Health and Medicine, Xiamen University, China.) (denoted as DAISM-DNN) pipeline that trains a deep neural network (DNN) with dataset-specific training data populated from a certain amount of calibrated samples using DAISM, a novel data augmentation method with an in silico mixing strategy. The evaluation results demonstrate that the DAISM-DNN pipeline outperforms other existing methods consistently and substantially for all the cell types under evaluation in real-world datasets.

Keywords: cell type proportion estimation; data augmentation; data simulation; deconvolution; deep learning.