MultiK: an automated tool to determine optimal cluster numbers in single-cell RNA sequencing data

Genome Biol. 2021 Aug 19;22(1):232. doi: 10.1186/s13059-021-02445-5.

Abstract

Single-cell RNA sequencing (scRNA-seq) provides new opportunities to characterize cell populations, typically accomplished through some type of clustering analysis. Estimation of the optimal cluster number (K) is a crucial step but often ignored. Our approach improves most current scRNA-seq cluster methods by providing an objective estimation of the number of groups using a multi-resolution perspective. MultiK is a tool for objective selection of insightful Ks and achieves high robustness through a consensus clustering approach. We demonstrate that MultiK identifies reproducible groups in scRNA-seq data, thus providing an objective means to estimating the number of possible groups or cell-type populations present.

Keywords: Clustering; Genomics; Multi-resolution; Multi-scale; Reproducibility; Single-cell RNA-seq.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Animals
  • Cell Line
  • Cluster Analysis
  • Gene Expression
  • Genomics
  • Humans
  • Mammary Glands, Animal
  • Mice
  • RNA-Seq
  • Sequence Analysis, RNA / methods*
  • Single-Cell Analysis / methods*
  • Software*
  • Workflow