
A semantic approach for text document clustering using frequent itemsets and WordNet
Author(s) -
Harsha Patil,
Ramjeevan Singh Thakur
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i2.9.10220
Subject(s) - wordnet , cluster analysis , document clustering , computer science , information retrieval , semantic similarity , similarity (geometry) , function (biology) , data mining , natural language processing , artificial intelligence , evolutionary biology , biology , image (mathematics)
Document Clustering is an unsupervised method for classified documents in clusters on the basis of their similarity. Any document get it place in any specific cluster, on the basis of membership score, which calculated through membership function. But many of the traditional clustering algorithms are generally based on only BOW (Bag of Words), which ignores the semantic similarity between document and Cluster. In this research we consider the semantic association between cluster and text document during the calculation of membership score of any document for any specific cluster. Several researchers are working on semantic aspects of document clustering to develop clustering performance. Many external knowledge bases like WordNet, Wikipedia, Lucene etc. are utilized for this purpose. The proposed approach exploits WordNet to improve cluster member ship function. The experimental result shows that clustering quality improved significantly by using proposed framework of semantic approach.