CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing

Nucleic Acids Res. 2019 Sep 19;47(16):e95. doi: 10.1093/nar/gkz543.

Abstract

Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies, currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical classification) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH's accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Acinar Cells / immunology
  • Acinar Cells / pathology
  • Algorithms*
  • Base Sequence
  • Cell Lineage / genetics*
  • Cell Lineage / immunology
  • Cluster Analysis
  • Datasets as Topic
  • Dendritic Cells / immunology
  • Dendritic Cells / pathology
  • Gene Expression Profiling
  • High-Throughput Nucleotide Sequencing / methods*
  • Humans
  • Neoplasms / genetics*
  • Neoplasms / immunology
  • Neoplasms / pathology
  • Organ Specificity
  • Pancreas / immunology
  • Pancreas / pathology
  • RNA, Messenger / analysis*
  • RNA, Messenger / genetics
  • Sequence Analysis, RNA / statistics & numerical data*
  • Single-Cell Analysis / methods*
  • Software
  • T-Lymphocytes / immunology
  • T-Lymphocytes / pathology
  • Tumor Cells, Cultured

Substances

  • RNA, Messenger