z-logo
open-access-imgOpen Access
A novel clustering algorithm for time-series data based on precise correlation coefficient matching in the IoT
Author(s) -
Hai Bo Li,
Jun Cheng Tong
Publication year - 2019
Publication title -
mathematical biosciences and engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.451
H-Index - 45
eISSN - 1551-0018
pISSN - 1547-1063
DOI - 10.3934/mbe.2019331
Subject(s) - tree traversal , cluster analysis , subsequence , computer science , algorithm , matching (statistics) , series (stratigraphy) , set (abstract data type) , data mining , sequence (biology) , hierarchical clustering , correlation clustering , data stream clustering , cure data clustering algorithm , mathematics , artificial intelligence , statistics , mathematical analysis , paleontology , genetics , bounded function , biology , programming language
In smart environments based on the Internet of Things (IoT), almost all of the object information that is collected by various sensors is time series data, which records the behavior of the objects. Analyzing the correlation between different time series data, other than those in the same time series, is more helpful to discovering their behavioral relations. This has become one of the important current issues in the IoT. To analyze the correlation, a clustering algorithm named the CPCCM (clustering algorithm based on precise correlation coefficient matching) is presented. First, each initial sequence is split into a set of subsequences by adopting a preset sliding window. Then, the correlation coefficients between any pair of subsequence sets from two sequences are resolved. Those pairs that pass some preset Pearson correlation coefficient threshold are clustered. In the CPCCM, a cross-traversal strategy is introduced to improve the search efficiency. The cross-traversal strategy alternatively searches the subsequences in two subsequence sets. To improve the clustering efficiency, in each initial sequence, adjacent subsequences are merged into longer subsequences and replaced by it if they appear in the same subsequence set. Finally, by analyzing practical electric power consumption data, the CPCCM is shown to be promising and able to be applied in similar scenarios. By comparison with the agglomerative hierarchical clustering algorithm, the major contributions of this work is that the clustering quality is improved by using the strategy of precise matching and cross-traversal, and complexity of the algorithm is reduced by merging adjacent subsequences. Therefore, CPCCM can be applied to analyze behavior between different objects in smart environments.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here