z-logo
Premium
Selecting Appropriate Clustering Methods for Materials Science Applications of Machine Learning
Author(s) -
Parker Amanda J.,
Barnard Amanda S.
Publication year - 2019
Publication title -
advanced theory and simulations
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.068
H-Index - 17
ISSN - 2513-0390
DOI - 10.1002/adts.201900145
Subject(s) - cluster analysis , cure data clustering algorithm , computer science , correlation clustering , canopy clustering algorithm , data mining , clustering high dimensional data , determining the number of clusters in a data set , data stream clustering , curse of dimensionality , outlier , benchmark (surveying) , artificial intelligence , machine learning , pattern recognition (psychology) , geodesy , geography
Based on a general definition of a cluster and the quality of a clustering result, a new method for evaluating existing clustering algorithms, or undertaking clustering, capable of predicting the number and type of clusters and outliers present in a data set, regardless of the complexity of the distribution of points, is presented. This algorithm, referred to as iterative label spreading, can recognize the characteristics expected of a successful clustering result before any clustering algorithm is applied, providing a type of hyper‐parameter optimization for clustering. The efficacy of the algorithm, and the assessment of clustering result, are both confirmed using large benchmark two dimensional synthetic data sets, and small multidimensional data describing a set of silver nanoparticles. It is shown that the method is ideal for studying noisy data with high dimensionality and high variance, typical of data captured in materials and nanoscience.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here