Document Clustering using Self-Organizing Maps | Zendy

Muhammad Rafi | Zendy; Muhammad Waqar | Zendy; Hareem Ajaz | Zendy; Umar Ayub | Zendy; Muhammad Danish | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Document Clustering using Self-Organizing Maps

Author(s) -

Muhammad Rafi,

Muhammad Waqar,

Hareem Ajaz,

Umar Ayub,

Muhammad Danish

Publication year - 2017

Publication title -

mendel

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.221

H-Index - 13

eISSN - 1803-3822

pISSN - 1803-3814

DOI - 10.13164/mendel.2017.1.111

Subject(s) - computer science , self organizing map , cluster analysis , document clustering , artificial intelligence , dimension (graph theory) , feature (linguistics) , cluster (spacecraft) , set (abstract data type) , hierarchical clustering , pattern recognition (psychology) , data mining , information retrieval , mathematics , linguistics , philosophy , pure mathematics , programming language

Cluster analysis of textual documents is a common technique for better ltering, navigation, under-standing and comprehension of the large document collection. Document clustering is an autonomous methodthat separate out large heterogeneous document collection into smaller more homogeneous sub-collections calledclusters. Self-organizing maps (SOM) is a type of arti cial neural network (ANN) that can be used to performautonomous self-organization of high dimension feature space into low-dimensional projections called maps. Itis considered a good method to perform clustering as both requires unsupervised processing. In this paper, weproposed a SOM using multi-layer, multi-feature to cluster documents. The paper implements a SOM usingfour layers containing lexical terms, phrases and sequences in bottom layers respectively and combining all atthe top layers. The documents are processed to extract these features to feed the SOM. The internal weightsand interconnections between these layers features(neurons) automatically settle through iterations with a smalllearning rate to discover the actual clusters. We have performed extensive set of experiments on standard textmining datasets like: NEWS20, Reuters and WebKB with evaluation measures F-Measure and Purity. Theevaluation gives encouraging results and outperforms some of the existing approaches. We conclude that SOMwith multi-features (lexical terms, phrases and sequences) and multi-layers can be very e ective in producinghigh quality clusters on large document collections.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research