z-logo
open-access-imgOpen Access
An efficient hybrid distributed document clustering algorithm
Author(s) -
Eid J,
J. Jayakumari
Publication year - 2015
Publication title -
scientific research and essays
Language(s) - English
Resource type - Journals
ISSN - 1992-2248
DOI - 10.5897/sre2014.6107
Subject(s) - cluster analysis , computer science , speedup , particle swarm optimization , canopy clustering algorithm , data mining , document clustering , correlation clustering , data stream clustering , cure data clustering algorithm , curse of dimensionality , algorithm , parallel computing , machine learning
Recent advances in information technology have led to an increase in volumes of data thereby exceeding beyond petabytes. Clustering distributed document sets from a central location is difficult due to the massive demand of computational resources. So there is a need for distributed document clustering algorithms to cluster documents using distributed resources. The greatest challenge in this area of distributed document clustering is the clustering quality and speedup associated with increase in document sets. The proposed clustering algorithm uses a hybrid algorithm which comprises of Particle Swarm Optimization (PSO), K-Means clustering and Latent Semantic Indexing (LSI) algorithm (PKMeansLSI), and uses MapReduce framework for distributed computation. The resultant of this is that it ultimately promotes clustering quality of the algorithm. The MapReduce framework and its corresponding implementation Hadoop is used as a distributed programming model which stresses on the improvement factor of the speedup of algorithm. The execution time is dramatically reduced as the dimensionality of documents is reduced. Experiment results show improved quality and effectiveness of the hybrid algorithm with varying increase in document size.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom