z-logo
Premium
A parallel C4.5 decision tree algorithm based on MapReduce
Author(s) -
Mu Yashuang,
Liu Xiaodong,
Yang Zhihao,
Liu Xiaolin
Publication year - 2017
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4015
Subject(s) - computer science , incremental decision tree , id3 algorithm , decision tree , tree (set theory) , decision tree learning , tree traversal , algorithm , interval tree , node (physics) , search tree , time complexity , partition (number theory) , data mining , artificial intelligence , machine learning , mathematics , search algorithm , mathematical analysis , structural engineering , combinatorics , engineering
Summary In the supervised classification, large training data are very common, and decision trees are widely used. However, as some bottlenecks such as memory restrictions, time complexity, or data complexity, many supervised classifiers including classical C4.5 tree cannot directly handle big data. One solution for this problem is to design a highly parallelized learning algorithm. Motivated by this, we propose a parallelized C4.5 decision tree algorithm based on MapReduce (MR‐C4.5‐Tree) with 2 parallelized methods to build the tree nodes. First, an information entropy‐based parallelized attribute selection method (MR‐A‐S) on several subsets for MR‐C4.5‐Tree is proposed to confirm the best splitting attribute and the cut points. Then, a data splitting method (MR‐D‐S) in parallel is presented to partition the training data into subsets. At last, we introduce the MR‐C4.5‐Tree learning algorithm that grows in a top‐down recursive way. Besides, the depth of the constructed decision tree, the number of samples and the maximal class probability in each tree node are used as the termination conditions to avoid the over‐partitioning problem. Experimental studies show the feasibility and the good performance of the proposed parallelized MR‐C4.5‐Tree algorithm.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here