Identifying cell types based on expression profiles is a pillar of single cell analysis. Existing machine-learning methods identify predictive features from annotated training data, which are often not available in early-stage studies. This can lead to overfitting and inferior performance when applied to new data. To address these challenges we present scROSHI, which utilizes previously obtained cell type-specific gene lists and does not require training or the existence of annotated data. By respecting the hierarchical nature of cell type relationships and assigning cells consecutively to more specialized identities, excellent prediction performance is achieved. In a benchmark based on publicly available PBMC data sets, scROSHI outperforms competing methods when training data are limited or the diversity between experiments is large.
© The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.