Sparse kernel k-means clustering

J Appl Stat. 2024 Jun 5;52(1):158-182. doi: 10.1080/02664763.2024.2362266. eCollection 2025.

Abstract

Clustering is an essential technique that groups similar data points to uncover the underlying structure and features of the data. Although traditional clustering methods such as k-means are widely utilized, they have limitations in identifying nonlinear clusters. Thus, alternative techniques, such as kernel k-means and spectral clustering, have been developed to address this issue. However, another challenge arises when irrelevant variables are present in the data; this can be mitigated by employing variable selection methods such as the filter, wrapper, and embedded approaches. In this study, with a particular focus on kernel k-means clustering, we propose an embedded variable selection method using a tensor product space along with a general analysis of variance kernel for nonlinear clustering. Comprehensive experiments involving simulations and real data analysis demonstrated that the proposed method achieves competitive performance compared to existing approaches. Thus, the proposed method may serve as a reliable tool for accurate cluster identification and variable selection to gain insights into complex datasets.

Keywords: Nonlinear clustering; analysis of variance kernel; sparse learning; variable selection.

Grants and funding

Beomjin Park was supported by a research grant from Gyeongsang National University in 2022. Changyi Park was supported by a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (2022M3J6A1084845). Hosik Choi was supported by the Basic Science Research Program through the NRF funded by the Ministry of Education (2017R1D1A1B05028565).