Premium
Multivariate‐bounded Gaussian mixture model with minimum message length criterion for model selection
Author(s) -
Azam Muhammad,
Bouguila Nizar
Publication year - 2021
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.12688
Subject(s) - computer science , mnist database , cluster analysis , model selection , mixture model , selection (genetic algorithm) , artificial intelligence , representation (politics) , pattern recognition (psychology) , feature selection , bounded function , data mining , gaussian , machine learning , artificial neural network , mathematics , mathematical analysis , politics , political science , law , physics , quantum mechanics
Bounded support Gaussian mixture model (BGMM) has been proposed for data modelling as an alternative to unbounded support mixture models for the cases when the data lies in bounded support. In this paper, we propose applications of multivariate BGMM in data clustering for more insightful analysis of the model. We also propose minimum message length (MML) criterion for model selection in data clustering using multivariate BGMM. The presented model is applied to data clustering in several speech (TSP and Spoken Digits) and image databases (MNIST and Fashion MNIST). We also propose the application of BGMM in code‐book generation at feature extraction phase. Inspired by the success of bag of visual words approach in computer vision, it is also introduced in speech data representation and validated through experiments presented in this paper. For validation of model selection criterion, MML is applied to different medical, speech and image datasets. Experimental results obtained during the model selection through MML are further compared with seven different model selection criteria. The results presented in the paper demonstrate the effectiveness of BGMM for clustering speech and image databases, code‐book generation through clustering for feature representation and model selection.