Spectrum: fast density-aware spectral clustering for single and multi-omic data

Christopher R John; David Watson; Michael R Barnes; Costantino Pitzalis; Myles J Lewis

doi:10.1093/bioinformatics/btz704

Spectrum: fast density-aware spectral clustering for single and multi-omic data

Bioinformatics. 2020 Feb 15;36(4):1159-1166. doi: 10.1093/bioinformatics/btz704.

Authors

Christopher R John¹, David Watson^{2

3}, Michael R Barnes^{1

3}, Costantino Pitzalis¹, Myles J Lewis¹

Affiliations

¹ Centre for Experimental Medicine and Rheumatology, William Harvey Research Institute, Bart's and The London School of Medicine and Dentistry, Queen Mary University of London, London EC1M 6BQ, UK.
² Oxford Internet Institute, University of Oxford, Oxford OX1 3JS, UK.
³ The Alan Turing Institute, London NW1 2DB, UK.

Abstract

Motivation: Clustering patient omic data is integral to developing precision medicine because it allows the identification of disease subtypes. A current major challenge is the integration multi-omic data to identify a shared structure and reduce noise. Cluster analysis is also increasingly applied on single-omic data, for example, in single cell RNA-seq analysis for clustering the transcriptomes of individual cells. This technology has clinical implications. Our motivation was therefore to develop a flexible and effective spectral clustering tool for both single and multi-omic data.

Results: We present Spectrum, a new spectral clustering method for complex omic data. Spectrum uses a self-tuning density-aware kernel we developed that enhances the similarity between points that share common nearest neighbours. It uses a tensor product graph data integration and diffusion procedure to reduce noise and reveal underlying structures. Spectrum contains a new method for finding the optimal number of clusters (K) involving eigenvector distribution analysis. Spectrum can automatically find K for both Gaussian and non-Gaussian structures. We demonstrate across 21 real expression datasets that Spectrum gives improved runtimes and better clustering results relative to other methods.

Availability and implementation: Spectrum is available as an R software package from CRAN https://cran.r-project.org/web/packages/Spectrum/index.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Cluster Analysis
Humans
Precision Medicine*
Single-Cell Analysis
Software*
Transcriptome

Abstract

Publication types

MeSH terms

Grants and funding