z-logo
open-access-imgOpen Access
Feature Selection for High Dimensional Data Using Monte Carlo Tree Search
Author(s) -
Muhammad Umar Chaudhry,
Jee-Hyong Lee
Publication year - 2018
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2018.2883537
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
Feature selection is the preliminary step in machine learning and data mining. It identifies the most important and relevant features within a dataset by eliminating the redundant or irrelevant features. The substantial benefits may include an improved performance in terms of high prediction accuracy, reduced computational complexity, and simply interpretable underlying models. In this paper, we present a novel framework to investigate and understand the importance of Monte Carlo tree search (MCTS) in feature selection for very high-dimensional datasets. We construct a binary feature selection tree where each node represents one of the two feature states: a feature is selected or not. The search starts with an empty root node reflecting that no feature is selected. Then, the search tree is expanded by adding nodes in an incremental fashion through MCTS-based simulations. Following tree and default policy, every iteration generates an initial feature subset, where a filter is used to select the top k features forming the candidate feature subset. The classification accuracy is used as the goodness or reward of the candidate feature subset and propagated backward up to the root node following the active path. Finally, the candidate subset with highest reward is selected as the best feature subset. Experiments are performed on 30 real-world datasets, including 14 very high-dimensional microarray datasets, and results are also compared with state-of-the-art methods in the literature, which proves the efficacy, validity, and significance of the proposed method.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom