Balancing effort and benefit of K-means clustering algorithms in Big Data realms | Zendy

Joaquín Pérez-Ortega | Zendy; Nelva Nely Almanza-Ortega | Zendy; David Romero | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Balancing effort and benefit of K-means clustering algorithms in Big Data realms

Author(s) -

Joaquín Pérez-Ortega,

Nelva Nely Almanza-Ortega,

David Romero

Publication year - 2018

Publication title -

plos one

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.99

H-Index - 332

ISSN - 1932-6203

DOI - 10.1371/journal.pone.0201874

Subject(s) - initialization , cluster analysis , computer science , algorithm , big data , convergence (economics) , process (computing) , cluster (spacecraft) , contrast (vision) , k means clustering , quality (philosophy) , data mining , artificial intelligence , philosophy , epistemology , economics , programming language , economic growth , operating system

In this paper we propose a criterion to balance the processing time and the solution quality of k -means cluster algorithms when applied to instances where the number n of objects is big. The majority of the known strategies aimed to improve the performance of k -means algorithms are related to the initialization or classification steps. In contrast, our criterion applies in the convergence step, namely, the process stops whenever the number of objects that change their assigned cluster at any iteration is lower than a given threshold. Through computer experimentation with synthetic and real instances, we found that a threshold close to 0.03 n involves a decrease in computing time of about a factor 4/100, yielding solutions whose quality reduces by less than two percent. These findings naturally suggest the usefulness of our criterion in Big Data realms.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research