
Evaluación de un clasificador de textos digitales basado en el contenido semántico a través de ontologías
Author(s) -
Héctor Daniel Hernández-García,
Navarrete-Arias Dulce J.,
Mario Pérez-Bautista,
Eliud Paredes-Reyes
Publication year - 2020
Publication title -
revista de ingenieria innovativa
Language(s) - English
Resource type - Journals
ISSN - 2523-6873
DOI - 10.35429/joie.2020.15.4.37.44
Subject(s) - computer science , task (project management) , thematic map , sentence , set (abstract data type) , information retrieval , relation (database) , ontology , domain (mathematical analysis) , artificial intelligence , word (group theory) , vector space model , natural language processing , data mining , mathematics , mathematical analysis , philosophy , geometry , cartography , management , epistemology , economics , programming language , geography
Nowadays, the generation of information through digital text documents has increased exponentially, so there is a need to store documents in mass storage devices such as high capacity hard discs, storage servers, the cloud and others. However, the storage that is carried out lacks a thematic organization, therefore, a search for information becomes complex. Given this problem, this publication describes the development of a system that has the purpose of classifying a digital text document based on the thematic content. This system implements ontologies to achieve a better classification by taking advantage of its characteristics. The system is divided into five tasks: the first is the implementation of a word count to create a frequency vector; The second task performs a refinement on the frequency vector to eliminate the sentence connectors and prepositions; the third task orders the vector from the highest to the lowest frequency; the fourth task takes the most significant set of frequencies vector, in which the ontology of a domain is applied and the relation that the words have to determine the thematic of the document is sought; and the fifth task is to organize the documents in a folder structure based on the identified domains. The system was developed with the incremental development methodology. To validate the operation of the system, a set of tests was carried out in a controlled scenario in order to verify the correct classification of the documents.