Variable selection in penalized model-based clustering via regularization on grouped parameters

Biometrics. 2008 Sep;64(3):921-930. doi: 10.1111/j.1541-0420.2007.00955.x. Epub 2007 Dec 20.

Abstract

Penalized model-based clustering has been proposed for high-dimensional but small sample-sized data, such as arising from genomic studies; in particular, it can be used for variable selection. A new regularization scheme is proposed to group together multiple parameters of the same variable across clusters, which is shown both analytically and numerically to be more effective than the conventional L(1) penalty for variable selection. In addition, we develop a strategy to combine this grouping scheme with grouping structured variables. Simulation studies and applications to microarray gene expression data for cancer subtype discovery demonstrate the advantage of the new proposal over several existing approaches.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Analysis of Variance
  • Artificial Intelligence
  • Biometry / methods*
  • Cluster Analysis*
  • Gene Expression Profiling / statistics & numerical data
  • Genomics / statistics & numerical data
  • Humans
  • Leukemia / classification
  • Leukemia / genetics
  • Models, Statistical*