Document Categorization Based on Usage of Features Reduction with Synonyms Clustering in Weak Semantic Map
Author(s) -
Anastasiya Kostkina,
Denis Bodunkov,
Valentin Klimov
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.11.061
Subject(s) - computer science , categorization , cluster analysis , document clustering , information retrieval , feature (linguistics) , reduction (mathematics) , artificial intelligence , natural language processing , text categorization , data mining , philosophy , linguistics , geometry , mathematics
Nowadays the number of huge companies and corporations has in their disposition various non-structured texts, documents and other data, but most of this data is still just text documents with different subject matters and content. The work-flow organization on this data format is complicated because of their characteristics, and requires modern tools for processing and analysis. Possible problem solution is machine learning algorithms and natural language processing methods envolving, with existing clustering and classification algorithms improvement. For document classification, we propose a proprietary approach based on the us-age of a semantic map as a feature reduction tool. In this paper we are going to investigate the impact of this approach on the quality of classification of documents and describe its application to the implementation of the document categorization.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom