
Improve BIRCH algorithm for big data clustering
Author(s) -
Fanny Ramadhani,
Muhammad Zarlis,
Saib Suwilo
Publication year - 2020
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/725/1/012090
Subject(s) - cluster analysis , algorithm , tree (set theory) , node (physics) , computer science , matching (statistics) , big data , mathematics , data mining , statistics , artificial intelligence , combinatorics , engineering , structural engineering
Big Data is a collection of data with super large data volumes, has a very high diversity of data sources, so needs to be managed with methods and devices that help perform accordingly. Clustering is one of the effective techniques for dealing with Big Data. The hierarchical method with the BIRCH algorithm is able to produce a short time in data execution. The BIRCH algorithm is a matching grouping algorithm for very large data sets. In an algorithm, a CF-tree is built in which all entries in each leaf node must meet same T threshold, and the CF-tree is rebuilt at each stage with a different threshold. But using a static (fixed) threshold produces poor cluster quality, in this paper proposes a solution to this deficiency by modifying the Threshold value to dynamic so that it can produce good cluster quality and be validated using silhouette coefficient (SC). There is a very clear difference between the standard BIRCH algorithm and the BIRCH algorithm on the modified T parameter (BIRCH (CF-Leaf (modif)). The CF-Node result, the total CF-Entries and Total CF-Leaf Entries produced 60% less than CF-Node, the total CF-Entries and Total CF-Leaf Entries in the standard BIRCH algorithm.