z-logo
open-access-imgOpen Access
Evaluación de un clasificador de textos digitales basado en el contenido semántico a través de ontologías
Author(s) -
Héctor Daniel Hernández-García,
Navarrete-Arias Dulce J.,
Mario Pérez-Bautista,
Eliud Paredes-Reyes
Publication year - 2020
Publication title -
revista de ingenieria innovativa
Language(s) - English
Resource type - Journals
ISSN - 2523-6873
DOI - 10.35429/joie.2020.15.4.37.44
Subject(s) - computer science , task (project management) , thematic map , sentence , set (abstract data type) , information retrieval , relation (database) , ontology , domain (mathematical analysis) , artificial intelligence , word (group theory) , vector space model , natural language processing , data mining , mathematics , mathematical analysis , philosophy , geometry , cartography , management , epistemology , economics , programming language , geography
Nowadays, the generation of information through digital text documents has increased exponentially, so there is a need to store documents in mass storage devices such as high capacity hard discs, storage servers, the cloud and others. However, the storage that is carried out lacks a thematic organization, therefore, a search for information becomes complex. Given this problem, this publication describes the development of a system that has the purpose of classifying a digital text document based on the thematic content. This system implements ontologies to achieve a better classification by taking advantage of its characteristics. The system is divided into five tasks: the first is the implementation of a word count to create a frequency vector; The second task performs a refinement on the frequency vector to eliminate the sentence connectors and prepositions; the third task orders the vector from the highest to the lowest frequency; the fourth task takes the most significant set of frequencies vector, in which the ontology of a domain is applied and the relation that the words have to determine the thematic of the document is sought; and the fifth task is to organize the documents in a folder structure based on the identified domains. The system was developed with the incremental development methodology. To validate the operation of the system, a set of tests was carried out in a controlled scenario in order to verify the correct classification of the documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here