
Unbalanced Data Clustering with K-Means and Euclidean Distance Algorithm Approach Case Study Population and Refugee Data
Author(s) -
NM Faizah,
Surohman,
L. Fabrianto,
Hendra Hendra,
Riris Prasetyo
Publication year - 2020
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1477/2/022005
Subject(s) - refugee , cluster analysis , euclidean distance , cluster (spacecraft) , population , k means clustering , geography , computer science , statistics , mathematics , demography , sociology , artificial intelligence , programming language , archaeology
There is a lot of data that does not have a pattern and unbalanced that is difficult to classify, such as the total population of each country in the world is very varied especially when compared with the number of refugees from each of these countries, China and India numbered more than 2 billion people but the number of refugees is only 0, 01%, while Syria around 70% of the 18 million more residents are refugees. By using the K-Means algorithm, we can group countries that have similar characteristics of population and number of refugees, the average percentage of refugees to the population in each cluster is the character of the cluster. The methodology used in this study are: measures the distance of the data using the Euclidean distance formula, runs the K-Means algorithm, calculate the percentage value of each cluster and find conclusions from the characteristics of the clusters formed. We found how machine learning made a pattern of data without political and social issues, the result is K-Means describing machine learning can grouped every country in cluster that make sense.