Premium
Predictive weighting for cluster ensembles
Author(s) -
Smyth Christine,
Coomans Danny
Publication year - 2007
Publication title -
journal of chemometrics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.47
H-Index - 92
eISSN - 1099-128X
pISSN - 0886-9383
DOI - 10.1002/cem.1048
Subject(s) - weighting , computer science , similarity (geometry) , cluster (spacecraft) , artificial intelligence , data mining , matrix (chemical analysis) , regression , pattern recognition (psychology) , machine learning , algorithm , mathematics , statistics , chemistry , medicine , chromatography , image (mathematics) , radiology , programming language
An ensemble of regression models predicts by taking a weighted average of the predictions made by individual models. Calculating the weights such that they reflect the accuracy of individual models (post processing the ensemble) has been shown to increase the ensemble's accuracy. However, post processing cluster ensembles has not received as much attention because of the inherent difficulty in assessing the accuracy of an individual cluster model. By enforcing the notion that clusters must be ‘predictable’, this paper suggests a means of implicitly post processing cluster ensembles by drawing analogies with regression post processing techniques. The product of the post processing procedure is an intelligently weighted co‐occurrence matrix. A new technique, similarity‐based k ‐means (SBK), is developed to split this matrix into clusters. The results using three real life datasets underpinned by chemical and biological phenomena show that splitting an intelligently weighted co‐occurrence matrix gives accuracy that approaches supervised classification methods. Copyright © 2007 John Wiley & Sons, Ltd.