A MapReduce‐based parallel  K‐means  clustering for large‐scale CIM data verification | Zendy

Deng Chuang | Zendy; Liu Yang | Zendy; Xu Lixiong | Zendy; Yang Jie | Zendy; Liu Junyong | Zendy; Li Siguang | Zendy; Li Maozhen | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A MapReduce‐based parallel K‐means clustering for large‐scale CIM data verification

Author(s) -

Deng Chuang,

Liu Yang,

Xu Lixiong,

Yang Jie,

Liu Junyong,

Li Siguang,

Li Maozhen

Publication year - 2015

Publication title -

concurrency and computation: practice and experience

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.309

H-Index - 67

eISSN - 1532-0634

pISSN - 1532-0626

DOI - 10.1002/cpe.3580

Subject(s) - computer science , cluster analysis , cloud computing , software deployment , data intensive computing , distributed computing , data exchange , computer cluster , scale (ratio) , parallel computing , big data , computation , data mining , database , algorithm , grid computing , operating system , physics , geometry , mathematics , quantum mechanics , machine learning , grid

Summary The Common Information Model (CIM) has been heavily used in electric power grids for data exchange among a number of auxiliary systems such as communication systems, monitoring systems, and marketing systems. With a rapid deployment of digitalized devices in electric power networks, the volume of data continuously grows, which makes verification of CIM data a challenging issue. This paper presents a parallel K‐means clustering algorithm for large‐scale CIM data verification. The parallel K‐means builds on the MapReduce computing model which has been widely taken up by the community in dealing with data‐intensive applications. A genetic algorithm‐based load‐balancing scheme is designed to balance the workloads among the heterogeneous computing nodes for a further improvement in computation efficiency. The performance of the parallel K‐means is initially evaluated in a small‐scale in‐house MapReduce cluster and subsequently evaluated in a commercial cloud computing platform. Finally, the parallel K‐means is evaluated in large‐scale simulated MapReduce environments. Both the experimental and simulation results show that the parallel K‐means reduces the CIM data‐verification time significantly compared with the sequential K‐means clustering, while generating a high level of precision in data verification. Copyright © 2015 John Wiley & Sons, Ltd.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Accelerating Research