Premium
PFCA: An influence‐based parallel fuzzy clustering algorithm for large complex networks
Author(s) -
Bhatia Vandana,
Rani Rinkle
Publication year - 2018
Publication title -
expert systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.365
H-Index - 38
eISSN - 1468-0394
pISSN - 0266-4720
DOI - 10.1111/exsy.12295
Subject(s) - computer science , cluster analysis , fuzzy clustering , disjoint sets , data mining , complex network , algorithm , correlation clustering , artificial intelligence , mathematics , combinatorics , world wide web
Clustering helps in understanding the patterns present in networks and thus helps in getting useful insights. In real‐world complex networks, analysing the structure of the network plays a vital role in clustering. Most of the existing clustering algorithms identify disjoint clusters, which do not consider the structure of the network. Moreover, the clustering results do not provide consistency and precision. This paper presents an efficient parallel fuzzy clustering algorithm named “PFCA” for large complex networks using Hadoop and Pregel (parallel processing framework for large graphs). The proposed algorithm first selects the candidate cluster heads on the basis of their influence in the network and then determines the number of clusters by analysing the graph structure using PageRank algorithm. The proposed algorithm identifies both disjoint and fuzzy clusters efficiently and finds membership of only those vertices, which are the part of more than one cluster. The performance is validated on 6 real‐life networks having up to billions of connections. The experimental results show that the proposed algorithm scales up linearly with the increase in size of network. It is also shown that the proposed algorithm is efficient and has high precision in comparison with the other state‐of‐art fuzzy clustering algorithms in terms of F score and modularity.