z-logo
Premium
Multi‐view predictive partitioning in high dimensions
Author(s) -
McWilliams Brian,
Montana Giovanni
Publication year - 2012
Publication title -
statistical analysis and data mining: the asa data science journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.381
H-Index - 33
eISSN - 1932-1872
pISSN - 1932-1864
DOI - 10.1002/sam.11144
Subject(s) - data mining , cluster analysis , computer science , benchmark (surveying) , residual , statistic , dimensionality reduction , data point , artificial intelligence , pattern recognition (psychology) , algorithm , mathematics , statistics , geodesy , geography
Many modern data mining applications are concerned with the analysis of datasets in which the observations are described by paired high‐dimensional vectorial representations or ‘views’. Some typical examples can be found in web mining and genomics applications. In this article we present an algorithm for data clustering with multiple views, multi‐view predictive partitioning (MVPP), which relies on a novel criterion of predictive similarity between data points. We assume that, within each cluster, the dependence between multivariate views can be modeled by using a two‐block partial least squares (TB‐PLS) regression model, which performs dimensionality reduction and is particularly suitable for high‐dimensional settings. The proposed MVPP algorithm partitions the data such that the within‐cluster predictive ability between views is maximized. The proposed objective function depends on a measure of predictive influence of points under the TB‐PLS model which has been derived as an extension of the predicted residual sums of squares (PRESS) statistic commonly used in ordinary least squares regression. Using simulated data, we compare the performance of MVPP to that of competing multi‐view clustering methods which rely upon geometric structures of points, but ignore the predictive relationship between the two views. State‐of‐art results are obtained on benchmark web mining datasets. © 2012 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2012

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here