z-logo
Premium
A MapReduce‐based parallel K‐means clustering for large‐scale CIM data verification
Author(s) -
Deng Chuang,
Liu Yang,
Xu Lixiong,
Yang Jie,
Liu Junyong,
Li Siguang,
Li Maozhen
Publication year - 2015
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.3580
Subject(s) - computer science , cluster analysis , cloud computing , software deployment , data intensive computing , distributed computing , data exchange , computer cluster , scale (ratio) , parallel computing , big data , computation , data mining , database , algorithm , grid computing , operating system , physics , geometry , mathematics , quantum mechanics , machine learning , grid
Summary The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K‐means clustering algorithm for large‐scale CIM data verification. The parallel K‐means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data‐intensive applications. A genetic algorithm‐based load‐balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K‐means is initially evaluated in a small‐scale in‐house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K‐means is evaluated in large‐scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K‐means reduces the CIM data‐verification time significantly compared with the sequential K‐means clustering, while generating a high level of precision in data verification. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here