
Unsupervised machine learning and pandemics spread: the case of COVID-19
Author(s) -
Roberto Carlos Lyra da Silva,
Fernando Xavier,
Antônio Mauro Saraiva,
Carlos Eduardo Cugnasca
Publication year - 2020
Language(s) - English
Resource type - Conference proceedings
DOI - 10.5753/sbcas.2020.11548
Subject(s) - pandemic , covid-19 , cluster analysis , computer science , unsupervised learning , artificial intelligence , machine learning , hierarchical clustering , cluster (spacecraft) , population , data mining , geography , disease , medicine , infectious disease (medical specialty) , environmental health , pathology , virology , outbreak , programming language
Epidemics have severe impacts on people's health. The COVID-19 has infected more than 3 million people in 3 months. In this work, we explore the use of unsupervised machine learning to evaluate and monitor the disease spread worldwide in three points in time: January, February, and March of 2020. Besides the features related to the disease spread, we consider HDI, population density, and age structure. We define the number of clusters using the elbow and agglomerative clustering methods, then implement and evaluate the k-means algorithm with 3, 4, and 5 clusters. We conclude that four clusters better represent the data, analyze the clusters over time, and discuss the impacts on each depending on the measures adopted.