
INFOLOGICAL MONITORING SYSTEM OF ANALYTICAL DATA UNSTRUCTURED CONTENT
Author(s) -
Sergey N. Mikhailov,
О. Е. Klyuchnikova
Publication year - 2017
Publication title -
izvestiâ ûgo-zapadnogo gosudarstvennogo universiteta
Language(s) - English
Resource type - Journals
eISSN - 2686-6757
pISSN - 2223-1560
DOI - 10.21869/2223-1560-2017-21-5-45-61
Subject(s) - computer science , information retrieval , set (abstract data type) , unstructured data , subject (documents) , the internet , service (business) , world wide web , data mining , economy , economics , programming language , big data
In operation the way of solving the problem of quick search of information in unstructured information resources is offered. Four main units realizing information search in semantic values are constructed and described. In article the algorithm of the decision of the task of assessment of compliance of semantic contents of text documents of the given data domain is offered. The offered infologichesky approach is executed on the basis of data analysis of patent search, the published scientific operations and the conducted pilot studies of effective methods of automatic assessment of maintenance of unstructured information resources for the organization of processes of information and analytical support of scientific activities. In operation the method of assessment and comparison of a subject directivity of data in unstructured information resources, on a basis use of infologichesky system is offered. This method assumes carrying out a clustering of text documents by comparing of semantic contents of the researched text and the anthology. The structure of the retrieval subsystem having the service-oriented client-server architecture with the thin client (web observer) is described. The described method was approved on a set of the texts received as a result of monitoring of open public infocommunication Internet resources without restriction of a subject (more than 1 million copies of texts are received and processed). Among the received texts by an expert way learning selection for the following types of texts was created: artistic texts, scientific technical articles, the pseudoscientific texts received as a result of operation of systems, a spam automatically generated - the containing texts. The composition is offered and the general architecture of the software of infologichesky system is described, principal components of system are cross-platform. On the basis of results of the pilot studies the basic possibility of implementation of automated assessment of subject similarity of documents on the example of infologichesky processing of texts of working programs of disciplines is shown, requirements imposed to the program interface of interaction of a prototype with external search engines are created.Key words: infological system, assessment of the thematic similarity, information resource working program of discipline, competence, semantic analysis, meaning.