Evaluación de un clasificador de textos digitales basado en el contenido semántico a través de ontologías | Zendy

Héctor Daniel Hernández-García | Zendy; Navarrete-Arias Dulce J. | Zendy; Mario Pérez-Bautista | Zendy; Eliud Paredes-Reyes | Zendy

AI Assistant Blog Pricing

Open Access

Evaluación de un clasificador de textos digitales basado en el contenido semántico a través de ontologías

Author(s) -

Héctor Daniel Hernández-García,

Navarrete-Arias Dulce J.,

Mario Pérez-Bautista,

Eliud Paredes-Reyes

Publication year - 2020

Publication title -

revista de ingenieria innovativa

Language(s) - English

Resource type - Journals

ISSN - 2523-6873

DOI - 10.35429/joie.2020.15.4.37.44

Subject(s) - computer science , task (project management) , thematic map , sentence , set (abstract data type) , information retrieval , relation (database) , ontology , domain (mathematical analysis) , artificial intelligence , word (group theory) , vector space model , natural language processing , data mining , mathematics , management , economics , cartography , geography , epistemology , philosophy , mathematical analysis , geometry , programming language

Nowadays, the generation of information through digital text documents has increased exponentially, so there is a need to store documents in mass storage devices such as high capacity hard discs, storage servers, the cloud and others. However, the storage that is carried out lacks a thematic organization, therefore, a search for information becomes complex. Given this problem, this publication describes the development of a system that has the purpose of classifying a digital text document based on the thematic content. This system implements ontologies to achieve a better classification by taking advantage of its characteristics. The system is divided into five tasks: the first is the implementation of a word count to create a frequency vector; The second task performs a refinement on the frequency vector to eliminate the sentence connectors and prepositions; the third task orders the vector from the highest to the lowest frequency; the fourth task takes the most significant set of frequencies vector, in which the ontology of a domain is applied and the relation that the words have to determine the thematic of the document is sought; and the fifth task is to organize the documents in a folder structure based on the identified domains. The system was developed with the incremental development methodology. To validate the operation of the system, a set of tests was carried out in a controlled scenario in order to verify the correct classification of the documents.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom

About

About Careers Publisher Partners Contact Us Our institutional solutions Get Organisational Trial or Quote

Learn

FAQs Blog Terms of Use Privacy Policy

Download the Zendy App

Discover

Explore

Home ZAIA Blog