A heuristic approach for load balancing the FP-growth algorithm on MapReduce | Zendy

Sikha Bagui | Zendy; Keerthi Devulapalli | Zendy; John W. Coffey | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A heuristic approach for load balancing the FP-growth algorithm on MapReduce

Author(s) -

Sikha Bagui,

Keerthi Devulapalli,

John W. Coffey

Publication year - 2020

Publication title -

array

Language(s) - English

Resource type - Journals

ISSN - 2590-0056

DOI - 10.1016/j.array.2020.100035

Subject(s) - computer science , load balancing (electrical power) , heuristic , association rule learning , execution time , big data , cluster (spacecraft) , data mining , parallel computing , algorithm , artificial intelligence , mathematics , operating system , geometry , grid

Frequent itemset discovery is an important step in Association Rule Mining. The Frequent Pattern (FP) growth algorithm, often used for discovering frequent itemsets, cannot scale directly to today’s Big Data, especially for large sparse datasets. Hence there is a need to distribute and parallelize the FP-growth algorithm. Parallel FP-growth (PFP) is a parallel implementation of the FP-growth algorithm on Hadoop’s MapReduce execution framework. Though PFP scales to large datasets, it suffers from imbalanced load across processing units. In this paper we propose a heuristic based, lower order of complexity, load balancing strategy for the PFP algorithm, called Heuristic Based PFP (HBPFP). Our results show that HBPFP distributes the load more evenly across the Hadoop cluster nodes, runs faster than the PFP algorithm, and uses cluster resources more efficiently, especially for large sparse datasets.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research