Concept‐relational text clustering | Zendy

Bronselaer Antoon | Zendy; Tré Guy De | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Concept‐relational text clustering

Author(s) -

Bronselaer Antoon,

Tré Guy De

Publication year - 2012

Publication title -

international journal of intelligent systems

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.291

H-Index - 87

eISSN - 1098-111X

pISSN - 0884-8173

DOI - 10.1002/int.21557

Subject(s) - computer science , relevance (law) , cluster analysis , sentence , document clustering , context (archaeology) , information retrieval , space (punctuation) , search engine indexing , set (abstract data type) , a priori and a posteriori , data mining , artificial intelligence , paleontology , philosophy , epistemology , political science , law , biology , programming language , operating system

The ongoing exponential growth of online information sources has led to a need for reliable and efficient algorithms for text clustering. In this paper, we propose a novel text model called the relational text model that represents each sentence as a binary multirelation over a concept space \documentclass{article}\usepackage{amssymb}\pagestyle{empty}\begin{document}${\mathcal{C}}$\end{document} . Through usage of the smart indexing engine (SIE), a patented technology of the Belgian company i.Know, the concept space adopted by the text model can be constructed dynamically. This means that there is no need for an a priori knowledge base such as an ontology, which makes our approach context independent. The concepts resulting from SIE possess the property that frequency of concepts is a measure for relevance. We exploit this property with the development of the CR ‐algorithm. Our approach relies on the representation of a data set \documentclass{article}\usepackage{amssymb}\pagestyle{empty}\begin{document}${\mathcal{D}}$\end{document} as a multirelation, of which k ‐cuts can be taken. These cuts can be seen as sets of relevant patterns with respect to the topics that are described by documents. Analysis of dependencies between patterns allows to produce clusters, such that precision is sufficiently high. The best k ‐cut is the one that best approximates the estimated number of clusters to ensure recall. Experimental results on Dutch news fragments show that our approach outperforms both basic and advanced methods. © 2012 Wiley Periodicals, Inc.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research