
Linguistic approach to the classification problem based on the multiset theory
Author(s) -
Liliya Demidova,
Ju S. Sokolova
Publication year - 2021
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/1047/1/012083
Subject(s) - multiset , binary classification , representation (politics) , binary number , visualization , computer science , set (abstract data type) , object (grammar) , artificial intelligence , data mining , a priori and a posteriori , mathematics , machine learning , pattern recognition (psychology) , support vector machine , arithmetic , combinatorics , politics , political science , law , programming language , philosophy , epistemology
The problem of developing generalizing decision rules for object classification, which arises under conditions of inaccurate knowledge about the values of objects’ attributes, and about the significance of the attributes themselves, has been considered. The approach to the binary classification of objects, which implements the representation of inaccurate knowledge based on linguistic variables and allows one to consider various strategies for the formation of generalizing decision rules for classification using the tools of multiset theory, has been proposed. The example of the formation of generalizing decision rules of binary classification for the set of competitive projects evaluated by the group of experts, confirming the effectiveness of the proposed approach, has been considered. A herewith, visualization of objects, the values of the features of which are the frequency of setting a certain score according to the a priori given rating scale by all experts, in a two-dimensional space using the non-linear dimensionality reduction algorithm named as the UMAP algorithm, has been implemented. Based on the results of visualization and cluster analysis of the initial set of competitive projects, the “noise” project, which negatively affects the results of the formation of generalizing decision rules of binary classification, was identified and removed from further analysis.