
Distributed Community Detection based on Apache Spark using Multi Label Propagation for Digital Social Networks
Author(s) -
Satya Keerthi Gorripati,
V. Prasanna Kumari
Publication year - 2018
Publication title -
international journal of engineering and technology
Language(s) - English
Resource type - Journals
ISSN - 2227-524X
DOI - 10.14419/ijet.v7i4.5.20016
Subject(s) - computer science , spark (programming language) , scalability , joins , big data , distributed computing , task (project management) , set (abstract data type) , data mining , database , management , economics , programming language
Organization, Government and Individual (OGI) have popularized the use of Digital Social Networks (DSN) that reduces the processing time of social-aware tasks. To accomplish a community-based communication, each social-aware task should identify its community group. The identified group uses a task to avail all the DSN benefits to their customers / citizens. As a result, the community-based detection algorithm has played a significant role in literature. However, the existing algorithms have had several challenging issues, such as performance and scalability. Thus, a distributed community detection algorithm is presented using Apache Spark’s Resilient Distributed Data Set (RDD) framework based on the Scala programming language. The Apache Spark framework provides an ideal solution that offers ease of coding, performance, interactive mode and disk Input-Output bottlenecks in Hadoop /Map Reduce. Besides, it presents a platform of distributed community detection that reduces the computational computation by applying transformations, aggregations and joins. The experimental results show that the proposed framework achieves high accuracy for both real-world and synthetic networks.