Clustering Deviation Index (CDI): a robust and accurate internal measure for evaluating scRNA-seq data clustering

Genome Biol. 2022 Dec 27;23(1):269. doi: 10.1186/s13059-022-02825-5.

Abstract

Most single-cell RNA sequencing (scRNA-seq) analyses begin with cell clustering; thus, the clustering accuracy considerably impacts the validity of downstream analyses. In contrast with the abundance of clustering methods, the tools to assess the clustering accuracy are limited. We propose a new Clustering Deviation Index (CDI) that measures the deviation of any clustering label set from the observed single-cell data. We conduct in silico and experimental scRNA-seq studies to show that CDI can select the optimal clustering label set. As a result, CDI also informs the optimal tuning parameters for any given clustering method and the correct number of cluster components.

Keywords: Assessment; Clustering; RNA-sequencing; Single-cell.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms*
  • Cluster Analysis
  • Gene Expression Profiling* / methods
  • Sequence Analysis, RNA / methods
  • Single-Cell Analysis / methods
  • Single-Cell Gene Expression Analysis