Comparative Study of Binary Classification Methods to Analyze a Massive Dataset on Virtual Machine
Author(s) -
Neelam Naik,
Seema Purohit
Publication year - 2017
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2017.08.232
Subject(s) - computer science , spark (programming language) , scalability , decision tree , random forest , tree (set theory) , virtual machine , data mining , cloud computing , distributed computing environment , machine learning , id3 algorithm , incremental decision tree , artificial intelligence , big data , decision tree learning , distributed computing , database , mathematics , mathematical analysis , programming language , operating system
Massive dataset can be analyzed by establishing physical distributed environment or by hiring cloud-based distributed environment. The advantage of cloud-based environment over physical environment is that, it provides scalable virtual resources on demand and thus makes it suitable for handling increase in volume of the data. The various hidden patterns in data can provide knowledge bases for decision making. The statistical or data mining based methods can be used for finding knowledge patterns. Among the decision tree based classification algorithms, implementable in distributed environment, an efficient algorithm can be selected based on few parameters such as execution time, accuracy of prediction and complexity of the tree structure. In this study, Apache Hadoop-based distributed environment is established on virtual machine. Apache Spark is installed to execute machine learning algorithms. The comparative study of binary classification methods such as decision tree, gradient boosted tree and random forest tree is performed to judge their performances on the basis of defined parameters. It is found that Random forest tree performs best among all three algorithms for the considered dataset.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom