z-logo
open-access-imgOpen Access
Unsupervised Learning with Term Clustering for Thematic Segmentation of Texts
Author(s) -
Marc Caillet,
Jean-François Pessiot,
Massih-Reza Amini,
Patrick Gallinari
Publication year - 2004
Language(s) - English
DOI - 10.5555/2816272.2816331
In this paper we introduce a machine learning approach for automatic text segmentation. Our text segmenter clusters text-segments containing similar concepts. It first discovers the different concepts present in a text, each concept being defined as a set of representative terms. After that the text is partitioned into coherent paragraphs using a clustering technique based on the Classification Maximum Likelihood approach. We evaluate the effectiveness of this technique on sets of concatenated paragraphs from two collections, the 7 sectors and the 20 Newsgroups corpus, and compare it to a baseline text segmentation technique proposed by Salton et al.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom