SCALING EVOLUTIONARY PROGRAMMING WITH THE USE OF APACHE SPARK
Author(s) -
Włodzimierz Funika,
Paweł Koperek
Publication year - 2016
Publication title -
computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.145
H-Index - 5
eISSN - 2300-7036
pISSN - 1508-2806
DOI - 10.7494/csci.2016.17.1.69
Subject(s) - computer science , bottleneck , spark (programming language) , implementation , cloud computing , distributed computing , symbolic regression , service (business) , big data , machine learning , genetic programming , data mining , software engineering , operating system , embedded system , economy , economics , programming language
Organizations across the globe gather more and more data, encouraged by easy-to-use and cheap cloud storage services. Large datasets require new approaches to analysis and processing, which include methods based on machine learning. In particular, symbolic regression can provide many useful insights. Unfortunately, due to high resource requirements, use of this method for large-scale dataset analysis might be unfeasible. In this paper, we analyze a bottleneck in the open-source implementation of this method we call hubert. We identify that the evaluation of individuals is the most costly operation. As a solution to this problem, we propose a new evaluation service based on the Apache Spark framework, which attempts to speed up computations by executing them in a distributed manner on a cluster of machines. We analyze the performance of the service by comparing the evaluation execution time of a number of samples with the use of both implementations. Finally, we draw conclusions and outline plans for further research.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom