Variable selection in penalized model-based clustering via regularization on grouped parameters

Benhuai Xie; Wei Pan; Xiaotong Shen

doi:10.1111/j.1541-0420.2007.00955.x

Variable selection in penalized model-based clustering via regularization on grouped parameters

Biometrics. 2008 Sep;64(3):921-930. doi: 10.1111/j.1541-0420.2007.00955.x. Epub 2007 Dec 20.

Authors

Benhuai Xie¹, Wei Pan¹, Xiaotong Shen²

Affiliations

¹ Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota 55455 U.S.A.
² School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455 U.S.A.

PMID: 18162109
DOI: 10.1111/j.1541-0420.2007.00955.x

Abstract

Penalized model-based clustering has been proposed for high-dimensional but small sample-sized data, such as arising from genomic studies; in particular, it can be used for variable selection. A new regularization scheme is proposed to group together multiple parameters of the same variable across clusters, which is shown both analytically and numerically to be more effective than the conventional L(1) penalty for variable selection. In addition, we develop a strategy to combine this grouping scheme with grouping structured variables. Simulation studies and applications to microarray gene expression data for cancer subtype discovery demonstrate the advantage of the new proposal over several existing approaches.

Publication types

Research Support, N.I.H., Extramural
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Analysis of Variance
Artificial Intelligence
Biometry / methods*
Cluster Analysis*
Gene Expression Profiling / statistics & numerical data
Genomics / statistics & numerical data
Humans
Leukemia / classification
Leukemia / genetics
Models, Statistical*

Abstract

Publication types

MeSH terms

Grants and funding