BIG Data: Implementation a Scala Approach for Large Scale Classification
Author(s) -
Yassine Sabri,
Najib El
Publication year - 2017
Publication title -
international journal of computer applications
Language(s) - English
Resource type - Journals
ISSN - 0975-8887
DOI - 10.5120/ijca2017915123
Subject(s) - scala , computer science , scale (ratio) , big data , data science , data mining , operating system , cartography , java , geography
Many scientic investigations require data-intensive research where big data are collected and analyzed. To get big insights from big data, we need to rst develop our initial hypotheses from the data and then test and validate our hypotheses about the data. We propose FS-S , a flexible and modular Scala based implementation of the Fixed Size Least Squares Support Vector Machine (FS-LSSVM) for large data sets. The framework consists of a set of modules for (gradient and gradient free) optimization, model representation, kernel functions and evaluation of FS-LSSVM models. A kernel based Fixed-Size Least Squares Support Vector Machine (FSLSSVM) model is implemented in the proposed framework, while heavily leveraging the parallel computing capabilities of Apache Spark. Global optimization routines like Coupled Simulated Annealing (CSA) and Grid Search are implemented and used to tune the hyper-parameters of the FS-LSSVM model. Finally, we carry out experiments on benchmark data sets like Magic Gamma, Forest Cover, Susy and higgs etc. and evaluate the performance of various kernel based FS-LSSVM models, all these combine to reveal an effective and ecient way to perform closed-loop big data analysis with visualization and scalable computing.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom