A Distributed Weighted Possibilistic c-Means Algorithm for Clustering Incomplete Big Sensor Data
Author(s) -
Qingchen Zhang,
Zhikui Chen
Publication year - 2014
Publication title -
international journal of distributed sensor networks
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.324
H-Index - 53
eISSN - 1550-1477
pISSN - 1550-1329
DOI - 10.1155/2014/430814
Subject(s) - computer science , cluster analysis , partition (number theory) , big data , data mining , set (abstract data type) , cure data clustering algorithm , missing data , algorithm , data set , canopy clustering algorithm , cloud computing , data stream clustering , fuzzy clustering , artificial intelligence , machine learning , mathematics , programming language , operating system , combinatorics
Possibilistic c-means clustering algorithm (PCM) has emerged as an important technique for pattern recognition and data analysis. Owning to the existence of many missing values, PCM is difficult to produce a good clustering result in real time. The paper proposes a distributed weighted possibillistic c-means clustering algorithm (DWPCM), which works in three steps. First the paper applies the partial distance strategy to PCM (PDPCM) for calculating the distance between any two objects in the incomplete data set. Further, a weighted PDPCM algorithm (WPCM) is designed to reduce the corruption of missing values by assigning low weight values to incomplete data objects. Finally, to improve the cluster speed of WPCM, the cloud computing technology is used to optimize the WPCM algorithm by designing the distributed weighted possibilistic c-means clustering algorithm (DWPCM) based on MapReduce. The experimental results demonstrate that the proposed algorithms can produce an appropriate partition efficiently for incomplete big sensor data.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom