
ANALYSIS OF NOVEL FEATURE SELECTION CRITERION BASED ON INTERACTIONS OF HIGHER ORDER IN CASE OF PRODUCTION PLANT DATA
Author(s) -
Mateusz Pawluk,
Dariusz Wierzba
Publication year - 2019
Publication title -
metody ilościowe w badaniach ekonomicznych/quantitative methods in economics
Language(s) - English
Resource type - Journals
eISSN - 2543-8565
pISSN - 2082-792X
DOI - 10.22630/mibe.2019.20.3.20
Subject(s) - feature selection , computer science , machine learning , artificial intelligence , selection (genetic algorithm) , multitude , data mining , pipeline (software) , classifier (uml) , feature (linguistics) , benchmark (surveying) , philosophy , linguistics , geodesy , epistemology , programming language , geography
Feature selection plays vital role in the processing pipeline of today’s data science applications and is a crucial step of the overall modeling process. Due to multitude of possibilities for extracting large and highly structured data in various fields, this is a serious issue in the area of machine learning without any optimal solution proposed so far. In recent years, methods based on concepts derived from information theory attracted particular attention, introducing eventually general framework to follow. The criterion developed by author et al., namely IIFS (Interaction Information Feature Selection), extended state-of-the-art methods by adopting interactions of higher order, both 3-way and 4-way. In this article, careful selection of data from industrial site was made in order to benchmark such approach with others. Results clearly show that including side effects in IIFS can reorder output set of features significantly and improve overall estimate of error for the selected classifier.