Open Access
Clustering Noisy Time Series
Author(s) -
Anastasiia Yevhenivna Tkachenko,
Liudmyla Olehivna Kyrychenko,
Tamara Anatoliivna Radyvylova
Publication year - 2019
Publication title -
sistemnì tehnologìï
Language(s) - English
Resource type - Journals
eISSN - 2707-7977
pISSN - 1562-9945
DOI - 10.34185/1562-9945-3-122-2019-15
Subject(s) - cluster analysis , dbscan , series (stratigraphy) , pattern recognition (psychology) , computer science , euclidean distance , data mining , artificial intelligence , time series , noise (video) , correlation clustering , cure data clustering algorithm , mathematics , machine learning , paleontology , image (mathematics) , biology
One of the urgent tasks of machine learning is the problem of clustering objects. Clustering time series is used as an independent research technique, as well as part of more complex data mining methods, such as rule detection, classification, anomaly detection, etc.A comparative analysis of clustering noisy time series is carried out. The clustering sample contained time series of various types, among which there were atypical objects. Clustering was performed by k-means and DBSCAN methods using various distance functions for time series.A numerical experiment was conducted to investigate the application of the k-means and DBSCAN methods to model time series with additive white noise. The sample on which clustering was carried out consisted of m time series of various types: harmonic realizations, parabolic realizations, and “bursts”.The work was carried out clustering noisy time series of various types.DBSCAN and k-means methods with different distance functions were used. The best results were shown by the DBSCAN method with the Euclidean metric and the CID function.Analysis of the results of the clustering of time series allows determining the key differences between the methods: if you can determine the number of clusters and you do not need to separate atypical time series, the k-means method shows fairly good results; if there is no information on the number of clusters and there is a problem of isolating non-typical rows, it is advisable to use the DBSCAN method.