z-logo
open-access-imgOpen Access
GRAPH BASED CLUSTERING WITH CONSTRAINTS AND ACTIVE LEARNING
Author(s) -
Vu-Tuan Dang,
Viet-Vu Vu,
Hong-Quan Do,
Thi Kieu Oanh Le
Publication year - 2021
Publication title -
journal of computer science and cybernetics (vietnam academy of science and technology)/journal of computer science and cybernetics
Language(s) - English
Resource type - Journals
eISSN - 2815-5939
pISSN - 1813-9663
DOI - 10.15625/1813-9663/37/1/15773
Subject(s) - cluster analysis , computer science , correlation clustering , canopy clustering algorithm , constrained clustering , cure data clustering algorithm , data stream clustering , data mining , artificial intelligence , machine learning , conceptual clustering , graph , fuzzy clustering , theoretical computer science
During the past few years, semi-supervised clustering has emerged as a new interesting direction in machine learning research. In a semi-supervised clustering algorithm, the clustering results can be significantly improved by using side information, which is available or collected from users. There are two main kinds of side information that can be learned in semi-supervised clustering algorithms: the class labels - called seeds or the pairwise constraints. The first semi-supervised clustering was introduced in 2000, and since that, many algorithms have been presented in literature. However, it is not easy to use both types of side information in the same algorithm. To address the problem, this paper proposes a semi-supervised graph based clustering algorithm that tries to use seeds and constraints in the clustering process, called MCSSGC. Moreover, we introduces a simple but efficient active learning method to collect the constraints that can boost the performance of MCSSGC, named KMMFFQS. In order to verify effectiveness of the proposed algorithm, we conducted a series of experiments not only on real data sets from UCI, but also on a document data set applied in an Information Extraction of Vietnamese documents. These obtained results show that the proposed algorithm can significantly improve the clustering process compared to some recent algorithms.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here