z-logo
Premium
Variable Selection for Clustering with Gaussian Mixture Models
Author(s) -
Maugis Cathy,
Celeux Gilles,
MartinMagniette MarieLaure
Publication year - 2009
Publication title -
biometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.298
H-Index - 130
eISSN - 1541-0420
pISSN - 0006-341X
DOI - 10.1111/j.1541-0420.2008.01160.x
Subject(s) - feature selection , identifiability , bayesian information criterion , cluster analysis , model selection , lasso (programming language) , variable (mathematics) , context (archaeology) , mathematics , mixture model , computer science , selection (genetic algorithm) , consistency (knowledge bases) , statistics , artificial intelligence , mathematical analysis , paleontology , world wide web , biology
Summary This article is concerned with variable selection for cluster analysis. The problem is regarded as a model selection problem in the model‐based cluster analysis context. A model generalizing the model of Raftery and Dean (2006,  Journal of the American Statistical Association   101 , 168–178) is proposed to specify the role of each variable. This model does not need any prior assumptions about the linear link between the selected and discarded variables. Models are compared with Bayesian information criterion. Variable role is obtained through an algorithm embedding two backward stepwise algorithms for variable selection for clustering and linear regression. The model identifiability is established and the consistency of the resulting criterion is proved under regularity conditions. Numerical experiments on simulated datasets and a genomic application highlight the interest of the procedure.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here