Scale Genetic Programming for large Data Sets: Case of Higgs Bosons Classification
Author(s) -
Hmida Hmida,
Sana Ben Hamida,
Amel Borgi,
Marta Rukoz
Publication year - 2018
Publication title -
procedia computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.334
H-Index - 76
ISSN - 1877-0509
DOI - 10.1016/j.procs.2018.07.264
Subject(s) - computer science , genetic programming , heuristic , set (abstract data type) , artificial intelligence , higgs boson , field (mathematics) , big data , data set , machine learning , benchmark (surveying) , scale (ratio) , theoretical computer science , data mining , mathematics , programming language , geodesy , pure mathematics , geography , physics , quantum mechanics
Extract knowledge and significant information from very large data sets is a main topic in Data Science, bringing the interest of researchers in machine learning field. Several machine learning techniques have proven effective to deal with massive data like Deep Neuronal Networks. Evolutionary algorithms are considered not well suitable for such problems because of their relatively high computational cost. This work is an attempt to prove that, with some extensions, evolutionary algorithms could be an interesting solution to learn from very large data sets. We propose the use of the Cartesian Genetic Programming (CGP) as meta-heuristic approach to learn from the Higgs big data set. CGP is extended with an active sampling technique in order to help the algorithm to deal with the mass of the provided data. The proposed method is able to take up the challenge of dealing with the complete benchmark data set of 11 million events and produces satisfactory preliminary results.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom