z-logo
Premium
Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis
Author(s) -
Seshadri Karthick,
Iyer K. Viswanathan,
S Mercy Shalinie
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.5094
Subject(s) - generalization , computer science , hierarchy , cluster analysis , hierarchical clustering , hierarchical clustering of networks , directory , scheme (mathematics) , data mining , algorithm , theoretical computer science , correlation clustering , canopy clustering algorithm , artificial intelligence , mathematics , mathematical analysis , economics , market economy , operating system
Summary We propose a parallel generalization scheme for Singular Value Decomposition–based clustering algorithms. The scheme enables the clustering algorithm to generate a hierarchy of clusters instead of a flat set of clusters. The generalization scheme infers the number of levels to be formed and the number of clusters per level of the hierarchy automatically without depending on any user‐supplied parameter. The performance of the suggested hierarchical clustering algorithm was evaluated using the web directory taxonomy hosted by the Open Directory DMOZ. Empirical evaluations and statistical tests reveal that the proposed generalization scheme produces a superior cluster hierarchy when compared with two existing generalization techniques in terms of the precision, recall, f‐measure, and the rand index. The generalization scheme is well‐equipped to deal with large datasets and the speed‐up achieved by the parallelized generalization scheme over its sequential variant was measured using a multicore computer.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here