Clustering de documents dans des collections hétérogènes
Author(s) -
Romaric Besançon,
Anne-Laure Daquo
Publication year - 2015
Language(s) - English
DOI - 10.24348/coria.2015.50
The goal of document clustering is to organize a collection of documents according to their topics, in order to facilitate the information access or to propose a synthetic view of the informational content of a collection of text. However, when the considered collection contains different types of documents, the clustering results tend to be impacted, because the similarity between the documents will rely as much on the type of the documents as on their topics. We present in this article a simple approach designed to take into account the type of documents in the document clustering task, using a feature selection method that exploits the type of the documents. We show the interest of this approach with an evaluation on a heterogeneous corpus specially designed for this task. MOTS-CLES : Clustering de textes, heterogeneite, selection de traits.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom