Using String Kernel for Document Clustering
Author(s) -
Shi Qingwei,
Xiaodong Qiao,
Guangquan Xu
Publication year - 2010
Publication title -
international journal of information technology and computer science
Language(s) - English
Resource type - Journals
eISSN - 2074-9015
pISSN - 2074-9007
DOI - 10.5815/ijitcs.2010.02.06
Subject(s) - computer science , string kernel , kernel (algebra) , cluster analysis , string (physics) , similarity (geometry) , document clustering , artificial intelligence , pattern recognition (psychology) , spectral clustering , kernel method , string metric , data mining , information retrieval , radial basis function kernel , string searching algorithm , mathematics , combinatorics , support vector machine , mathematical physics , image (mathematics) , pattern matching
In this paper, we present a string kernel based method for documents clustering. Documents are viewed as sequences of strings, and documents similarity is calculated by the kernel function. According to the documents similarity, spectral clustering algorithm is used to group documents. Experimental results shows that string kernel method outperform the standard k-means algorithm on the Reuters-21578 dataset.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom