An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture‐based clustering | Zendy

Hennig Christian | Zendy; Coretto Pietro | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

An adequacy approach for deciding the number of clusters for OTRIMLE robust Gaussian mixture‐based clustering

Author(s) -

Hennig Christian,

Coretto Pietro

Publication year - 2022

Publication title -

australian and new zealand journal of statistics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.434

H-Index - 41

eISSN - 1467-842X

pISSN - 1369-1473

DOI - 10.1111/anzs.12338

Subject(s) - cluster analysis , mathematics , bayesian information criterion , determining the number of clusters in a data set , mixture model , statistics , parametric statistics , measure (data warehouse) , gaussian , statistic , bayesian probability , algorithm , pattern recognition (psychology) , data mining , fuzzy clustering , cure data clustering algorithm , computer science , artificial intelligence , physics , quantum mechanics

Summary We introduce a new approach to deciding the number of clusters. The approach is applied to Optimally Tuned Robust Improper Maximum Likelihood Estimation (OTRIMLE; Coretto & Hennig, Journal of the American Statistical Association 111 , 1648–1659) of a Gaussian mixture model allowing for observations to be classified as ‘noise’, but it can be applied to other clustering methods as well. The quality of a clustering is assessed by a statistic Q that measures how close the within‐cluster distributions are to elliptical unimodal distributions that have the only mode in the mean. This non‐parametric measure allows for non‐Gaussian clusters as long as they have a good quality according to Q . The simplicity of a model is assessed by a measure S that prefers a smaller number of clusters unless additional clusters can reduce the estimated noise proportion substantially. The simplest model is then chosen that is adequate for the data in the sense that its observed value of Q is not significantly larger than what is expected for data truly generated from the fitted model, as can be assessed by parametric bootstrap. The approach is compared with model‐based clustering using the Bayesian information criterion (BIC) and the integrated complete likelihood (ICL) in a simulation study and on two real data sets.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore