z-logo
open-access-imgOpen Access
K-Means Document Clustering using Vector Space Model
Author(s) -
R. Malathi Ravindran,
Antony Selvadoss Thanamani
Publication year - 2015
Publication title -
bonfring international journal of data mining
Language(s) - English
Resource type - Journals
eISSN - 2277-5048
pISSN - 2250-107X
DOI - 10.9756/bijdm.8076
Subject(s) - vector space model , cluster analysis , computer science , space (punctuation) , document clustering , vector (molecular biology) , information retrieval , artificial intelligence , biology , biochemistry , gene , recombinant dna , operating system
Document Clustering is the collection of similar documents into classes and the similarity is some function on the document. Document Clustering need not require any separate training process and manual tagging group in advance. The documents used in the same clusters are more similar, while the documents used in different clusters are more dissimilar. It is one of the familiar technique used in data analysis and is used in many areas including data mining, statistics and image analysis. The traditional clustering approaches lose its algorithmic approach when handling high dimensional data. For this, a new K-Means Clustering technique is proposed in this work. Here Cosine Similarity of Vector Space Model is used as the centroid for clustering. Using this approach, the documents can be clustered efficiently even when the dimension is high because it uses vector space representation for documents which is suitable for high dimensions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom