
A comparative study of Distributed Large Scale Data Mining Algorithms
Author(s) -
Isha Sood,
Varsha Sharma
Publication year - 2020
Publication title -
bsss journal of computer/bsss journal of computer
Language(s) - English
Resource type - Journals
eISSN - 2582-4880
pISSN - 0975-7228
DOI - 10.51767/jc1102
Subject(s) - computer science , data mining , big data , scalability , data stream mining , construct (python library) , scale (ratio) , data science , data set , identification (biology) , set (abstract data type) , computation , concept mining , sampling (signal processing) , database , artificial intelligence , algorithm , web mining , botany , filter (signal processing) , biology , computer vision , programming language , physics , quantum mechanics , world wide web , web service
Essentially, data mining concerns the computation of data and the identification of patterns and trends in the information so that we might decide or judge. Data mining concepts have been in use for years, but with the emergence of big data, they are even more common. In particular, the scalable mining of such large data sets is a difficult issue that has attached several recent findings. A few of these recent works use the MapReduce methodology to construct data mining models across the data set. In this article, we examine current approaches to large-scale data mining and compare their output to the MapReduce model. Based on our research, a system for data mining that combines MapReduce and sampling is implemented and addressed