
Performance Evaluation of Map Reduce vs. Spark framework on Amazon Machine Image for TeraSort Algorithm
Author(s) -
Gangadhara Rao Kommu
Publication year - 2021
Publication title -
international journal for research in applied science and engineering technology
Language(s) - English
Resource type - Journals
ISSN - 2321-9653
DOI - 10.22214/ijraset.2021.35540
Subject(s) - speedup , computer science , spark (programming language) , sorting , java , implementation , generator (circuit theory) , parallel computing , algorithm , sorting algorithm , data mining , operating system , power (physics) , physics , quantum mechanics , programming language
TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. We focus on the comparison of TeraSort algorithm on the different distributed platforms with different configurations of the resources. We have considered the parameters of measure of efficiency as Compute Time, Data Read, Data Write, Compute Time, and Speedup. We have conducted experiments using Hadoop map reduce and Spark (Java). We empirically evaluate the performance of TeraSort algorithm on Amazon EC2 Machine Images, and demonstrate that it achieves 3.95 × - 2.4 × speedup, compared with TeraSort, for typical settings of interest.