
ISSUES ON TRADITIONAL AND MODERN TEXTUAL DOCUMENT CLUSTERING ALGORITHMS
Author(s) -
Wael M.S. Yafooz*
Publication year - 2016
Publication title -
zenodo (cern european organization for nuclear research)
Language(s) - English
DOI - 10.5281/zenodo.167097
Subject(s) - document clustering , cluster analysis , computer science , natural language processing , information retrieval , artificial intelligence , algorithm
The amount of digital data utilized in daily life has increased owing to the high dependence on such data. Most data can be stored in textual documents. With the rapid increase in the number of textual documents, users face problems in obtaining useful information. Thus, a method by which to manage data is required to give users an idea about content. In addition, techniques to increase the ratio of precision in information retrieval results are also needed. Therefore, the textual document clustering area is developed to represent the data in meaningful clusters. The two main factors encountered in the process of textual document clustering are efficiency and goodness or quality of data clusters. Efforts have been exerted to deal with these factors. These attempts can be categorized into either traditional or modern approaches. However, these attempts also face numerous issues. In this paper, we present the previous and current issues faced by textual document clustering algorithms to help text domain researchers understand these issues. This study provides researchers and students an overview about textual document clustering algorithms. Furthermore, this study can encourage researchers to find solutions to these issues