Open Access
A MapReduce-Based Parallel Mining for Frequent Itemset
Author(s) -
Nian Liu,
Junkang Guo
Publication year - 2019
Publication title -
journal of physics. conference series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.21
H-Index - 85
eISSN - 1742-6596
pISSN - 1742-6588
DOI - 10.1088/1742-6596/1302/2/022055
Subject(s) - association rule learning , computer science , data mining , task (project management) , set (abstract data type) , execution time , parallel computing , engineering , programming language , systems engineering
Frequent itemset mining is the most important step of association rule mining. Due to the large size of datasets, many parallel mining methods have been introduced to divide datasets or to distribute mining processes, which improve the efficiency of mining. In this paper, we propose a parallel algorithm of PrePost+ based on MapReduce (we call our parallel algorithm MR-PrePost+) to mine frequent itemsets. In our parallelization approach, the Reduce task performs a set of independent mining tasks, which eliminates data dependencies and computational dependencies between machines. The experimental results show that our MapReduce-based parallel algorithm MR-PrePost+ performs better than PFP in running time, which is one of the best frequent itemset mining parallel algorithms.