Apprentissage non-supervisé pour la segmentation automatique de textes
Author(s) -
Jean-François Pessiot,
Marc Caillet,
Massih-Reza Amini,
Patrick Gallinari
Publication year - 2004
Language(s) - English
DOI - 10.24348/coria.2004.213
In this paper we introduce a machine learning approach for automatic text segmentation. Our text segmenter clusters text-segments containing similar concepts. It first discovers the different concepts present in a text, each concept being defined as a set of representative terms. After that the text is partitioned into coherent paragraphs using a hard clustering technique based on the Classification Maximum Likelihood approach. We evaluate the effectiveness of this technique on a set of concatenated paragraphs from the %'& ( ) +-, . & data collection and compare it to a well-established text segmentation technique proposed by Salton et al. MOTS-CLÉS : Segmentation de texte, Apprentissage non-supervisé, Partition de mots, Vraisemblance classifiante.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom