Premium
Comparing representative selection strategies for dissimilarity representations
Author(s) -
Reynolds Zane,
Bunke Horst,
Last Mark,
Kandel Abraham
Publication year - 2006
Publication title -
international journal of intelligent systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.291
H-Index - 87
eISSN - 1098-111X
pISSN - 0884-8173
DOI - 10.1002/int.20180
Subject(s) - dimensionality reduction , computer science , principal component analysis , projection (relational algebra) , curse of dimensionality , projection pursuit , euclidean distance , representation (politics) , set (abstract data type) , euclidean space , artificial intelligence , space (punctuation) , selection (genetic algorithm) , pattern recognition (psychology) , data set , data mining , mathematics , algorithm , politics , political science , pure mathematics , law , programming language , operating system
Many of the computational intelligence techniques currently used do not scale well in data type or computational performance, so selecting the right dimensionality reduction technique for the data is essential. By employing a dimensionality reduction technique called representative dissimilarity to create an embedded space, large spaces of complex patterns can be simplified to a fixed‐dimensional Euclidean space of points. The only current suggestions as to how the representatives should be selected are principal component analysis, projection pursuit, and factor analysis. Several alternative representative strategies are proposed and empirically evaluated on a set of term vectors constructed from HTML documents. The results indicate that using a representative dissimilarity representation with at least 50 representatives can achieve a significant increase in classification speed, with a minimal sacrifice in accuracy, and when the representatives are selected randomly, the time required to create the embedded space is significantly reduced, also with a small penalty in accuracy. © 2006 Wiley Periodicals, Inc. Int J Int Syst 21: 1093–1109, 2006.