
Optimization of K Value at the K-NN algorithm in clustering using the expectation maximization algorithm
Author(s) -
Zulkarnain Lubis,
Poltak Sihombing,
Herman Mawengkang
Publication year - 2020
Publication title -
iop conference series. materials science and engineering
Language(s) - English
Resource type - Journals
eISSN - 1757-899X
pISSN - 1757-8981
DOI - 10.1088/1757-899x/725/1/012133
Subject(s) - cluster analysis , algorithm , value (mathematics) , k nearest neighbors algorithm , computer science , data mining , set (abstract data type) , data set , statistical classification , pattern recognition (psychology) , k means clustering , artificial intelligence , machine learning , programming language
Data is the most important thing in a study. The quality of the results of the research will be directly proportional to the quality of the data that will be used in the research is concerned. One of the problems that exist in the data set is the absence of a value in the data for a particular attribute or better known as the missing data. One method that is often used by researchers is the k-nearst Neighbor (KNN). However, this method has several drawbacks, one of which is the selection of appropriate values of k not to degrade the performance of the classification. In the process of calculating the parameters k KNN there that can affect the accuracy of the classification results. To use more than one parameter k then used by majority voting to determine the classification results. If the parameter k in KNN classification used 1 then the result was very tight because it will use the nearest neighbor to the results of the classification. Conversely, if the value of the parameter k used KNN is great then the classification results will blur.This research will optimize the parameters k in the UN tax cluster using the algorithm expectation Maximation (EM). The results of the research in the form of clustering information by using the number of clusters k value optimization and the number of clusters without using the optimization of the value k. Then analysis the results after getting data already clustered. Results from the study showed that k obtained from the optimization algorithm can improve the results of the cluster where the 66% error can be reduced to 64%, yet very close to the best result of the measurement accuracy is tested.