
Research of the efficiency of scientific and technical results in the field of chemical safety based on big data analysis
Author(s) -
С. В. Проничкин,
Igor Mamai
Publication year - 2021
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1942/1/012033
Subject(s) - computer science , multiset , field (mathematics) , data mining , representation (politics) , information retrieval , unification , data science , mathematics , combinatorics , politics , political science , pure mathematics , law , programming language
The search and extraction of targeted information about promising and breakthrough technologies for ensuring chemical safety is an important element in the analysis of large volumes of unstructured scientific and technical data. Existing approaches to processing large amounts of unstructured data can lead to distortion of the original information. New approaches to the search and extraction of target information based on the typification of the display of visualized large volumes of data of scientific and technical programs are proposed. It is proposed to overcome the disadvantages of existing approaches by using the representation of multi-attribute objects based on the multiset formalism, which allows one to simultaneously take into account all combinations of attribute values, as well as a different number of values for each of them. Multi-feature objects presented as multisets are proposed to be divided into relevant and irrelevant in terms of similarity to the reference multiset based on various metrics. This approach makes it possible to level the features of the initial data and opens up opportunities for solving new problems of studying large volumes of unstructured information of various nature. The results of the computational experiments in the chemical engineering field have shown the effectiveness of the proposed methodological approaches to the search and extraction of target information from large volumes of unstructured data of scientific and technical programs.