A novel clustering algorithm for time-series data based on precise correlation coefficient matching in the IoT | Zendy

Haibo Li | Zendy; Juncheng Tong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A novel clustering algorithm for time-series data based on precise correlation coefficient matching in the IoT

Author(s) -

Haibo Li,

Juncheng Tong

Publication year - 2019

Publication title -

mathematical biosciences and engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.451

H-Index - 45

eISSN - 1551-0018

pISSN - 1547-1063

DOI - 10.3934/mbe.2019331

Subject(s) - tree traversal , cluster analysis , subsequence , computer science , matching (statistics) , algorithm , set (abstract data type) , series (stratigraphy) , data mining , hierarchical clustering , sequence (biology) , correlation clustering , data stream clustering , correlation coefficient , cure data clustering algorithm , mathematics , artificial intelligence , machine learning , statistics , mathematical analysis , paleontology , genetics , bounded function , biology , programming language

In smart environments based on the Internet of Things (IoT), almost all of the object information that is collected by various sensors is time series data, which records the behavior of the objects. Analyzing the correlation between different time series data, other than those in the same time series, is more helpful to discovering their behavioral relations. This has become one of the important current issues in the IoT. To analyze the correlation, a clustering algorithm named the CPCCM (clustering algorithm based on precise correlation coefficient matching) is presented. First, each initial sequence is split into a set of subsequences by adopting a preset sliding window. Then, the correlation coefficients between any pair of subsequence sets from two sequences are resolved. Those pairs that pass some preset Pearson correlation coefficient threshold are clustered. In the CPCCM, a cross-traversal strategy is introduced to improve the search efficiency. The cross-traversal strategy alternatively searches the subsequences in two subsequence sets. To improve the clustering efficiency, in each initial sequence, adjacent subsequences are merged into longer subsequences and replaced by it if they appear in the same subsequence set. Finally, by analyzing practical electric power consumption data, the CPCCM is shown to be promising and able to be applied in similar scenarios. By comparison with the agglomerative hierarchical clustering algorithm, the major contributions of this work is that the clustering quality is improved by using the strategy of precise matching and cross-traversal, and complexity of the algorithm is reduced by merging adjacent subsequences. Therefore, CPCCM can be applied to analyze behavior between different objects in smart environments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research