Open Access
Combined Chi-Square with k-Means for Document Clustering
Author(s) -
Ammar Ismael Kadhim,
Abood Kirebut Jassim
Publication year - 2021
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/1076/1/012044
Subject(s) - document clustering , cluster analysis , weighting , computer science , ranking (information retrieval) , information retrieval , categorization , data mining , tf–idf , square (algebra) , conceptual clustering , fuzzy clustering , artificial intelligence , term (time) , cure data clustering algorithm , mathematics , physics , geometry , quantum mechanics , acoustics
Currently, the dynamic website has increased with more than thousands of documents associated to a category topic available. Most of the website documents are unstructured and not in an arranged method and thereby the user suffer to obtain the related documents. A more helpful and efficiency technique by combining document clustering with ranking, where document clustering can collection the similar documents in one category and document ranking can be carried out to each cluster for selecting the best documents in the initial categorization. Besides the specific clustering technique, the different types of term weighting functions implemented to select the features that it represents website document is a chief part in clustering mission. Moreover, document clustering indicates to unsupervised categorization of text documents into clusters in such a method that the text documents in a specific cluster are similar. Therefore, this study proposed a new technique combined chi-square with k-means for clustering the website documents. Furthermore, this study implements information gain and chi-square combined with k-means for document clustering. It helps the user to obtain the whole related documents in one cluster. For experimental objective, it has selected the BBC sport and BBC news datasets to show the superiority of the proposed technique. The experimental findings show that the chi-square with combined with k-means clustering improves the performance of document clustering.