z-logo
open-access-imgOpen Access
Learning Document Similarity Using Natural Language Processing
Author(s) -
Paola Merlo,
James Henderson,
Gerold Schneider,
Éric Wehrli
Publication year - 2003
Publication title -
linguistik online
Language(s) - English
Resource type - Journals
ISSN - 1615-3014
DOI - 10.13092/lo.17.788
Subject(s) - computer science , self organizing map , natural language processing , similarity (geometry) , representation (politics) , information retrieval , artificial intelligence , natural language , scale (ratio) , artificial neural network , image (mathematics) , physics , quantum mechanics , politics , political science , law
The recent considerable growth in the amount of easily available on-line text has brought to the foreground the need for large-scale natural language processing tools for text data mining. In this paper we address the problem of organizing documents into meaningful groups according to their content and to visualize a text collection, providing an overview of the range of documents and of their relationships, so that they can be browsed more easily. We use Self- Organizing Maps (SOMs) (Kohonen 1984). Great efficiency challenges arise in creating these maps. We study linguistically-motivated ways of reducing the representation of a document to increase efficiency and ways to disambiguate the words in the documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom