z-logo
open-access-imgOpen Access
A category-based probabilistic approach to feature selection
Author(s) -
Joel J. P. C. Rodrigues,
Wenxue Huang,
Yuanyi Pan
Publication year - 2017
Publication title -
big data and information analytics
Language(s) - English
Resource type - Journals
eISSN - 2380-6974
pISSN - 2380-6966
DOI - 10.3934/bdia.2017020
Subject(s) - categorical variable , feature selection , probabilistic logic , reliability (semiconductor) , feature (linguistics) , selection (genetic algorithm) , variable (mathematics) , set (abstract data type) , statistical model , sample (material) , continuous variable , computer science , mathematics , statistics , artificial intelligence , mathematical analysis , power (physics) , physics , linguistics , philosophy , chemistry , chromatography , quantum mechanics , programming language
A high dimensional and large sample categorical data set with a response variable may have many noninformative or redundant categories in its explanatory variables. Identifying and removing these categories usually improve the association but also give rise to significantly higher statistical reliability of selected features. A category-based probabilistic approach is proposed to achieve this goal. Supportive experiments are presented.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here