z-logo
open-access-imgOpen Access
Role of Pre-processing Phase in Document Clustering Technique for Gurmukhi Script
Author(s) -
Anand Kumar M,
Amandeep Verma
Publication year - 2020
Publication title -
international journal of innovative technology and exploring engineering
Language(s) - English
Resource type - Journals
ISSN - 2278-3075
DOI - 10.35940/ijitee.c9105.019320
Subject(s) - computer science , cluster analysis , document clustering , artificial intelligence , normalization (sociology) , document processing , natural language processing , pattern recognition (psychology) , information retrieval , data mining , sociology , anthropology
Document clustering plays a central role in knowledge discovery and data mining by representing large data-sets into a certain number of data objects called clusters. Each cluster consists similar data objects in such a way that data objects in the same cluster are more similar and dissimilar to the data objects of other clusters. Document clustering technique for Gurmukhi script consists two phases namely: 1) Pre-processing phase 2) Processing phase. This paper concentrates pre-processing phase of document clustering technique for Gurmukhi script. The purpose of pre-processing phase is to convert unstructured text into structured text format. Various sub-phases of pre-processing phase are: segmentation, tokenization, removal of stop words, stemming, and normalization. The purpose of this paper is to present the significant role of pre-processing phase in an overall performance of document clustering technique for Gurmukhi script. The experimental results represent the significant role of pre-processing phase in terms of performance regarding assignment of data objects to the relevant clusters as well as in creation of meaningful cluster title list. .

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here