z-logo
open-access-imgOpen Access
A study and Performance Comparison of MapReduce and Apache Spark on Twitter Data on Hadoop Cluster
Author(s) -
Md. Nowraj Farhan,
Ahsan Habib,
Md. Arshad Ali
Publication year - 2018
Publication title -
international journal of information technology and computer science
Language(s) - English
Resource type - Journals
eISSN - 2074-9015
pISSN - 2074-9007
DOI - 10.5815/ijitcs.2018.07.07
Subject(s) - spark (programming language) , computer science , big data , scalability , computer cluster , operating system , fault tolerance , distributed file system , cluster (spacecraft) , computation , parallel computing , database , algorithm , programming language
We explore Apache Spark, the newest tool to analyze big data, which lets programmers perform inmemory computation on large data sets in a fault tolerant manner. MapReduce is a high-performance distributed BigData programming framework which is highly preferred by most big data analysts and is out there for a long time with a very good documentation. The purpose of this project was to compare the scalability of opensource distributed data management systems like Apache Hadoop for small and medium data sets and to compare it’s performance against the Apache Spark, which is a scalable distributed in-memory data processing engine. To do this comparison some experiments were executed on data sets of size ranging from 5GB to 43GB, on both single machine and on a Hadoop cluster. The results show that the cluster outperforms the computation of a single machine by a huge range. Apache Spark outperforms MapReduce by a dramatic margin, and as the data grows Spark becomes more reliable and fault tolerant. We also got an interesting result that, with the increase of the number of blocks on the Hadoop Distributed File System, also increases the run-time of both the MapReduce and Spark programs and even in this case, Spark performs far more better than MapReduce. This demonstrates Spark as a possible replacement of MapReduce in the near future.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom