Comparison of feature selection techniques in classifying stroke documents | Zendy

Nur Syaza Izzati Mohd Rafei | Zendy; Rohayanti Hassan | Zendy; Rd. Rohmat Saedudin | Zendy; Anis Farihan Mat Raffei | Zendy; Zakaria Zalmiyah | Zendy; Shahreen Kasim | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Comparison of feature selection techniques in classifying stroke documents

Author(s) -

Nur Syaza Izzati Mohd Rafei,

Rohayanti Hassan,

Rd. Rohmat Saedudin,

Anis Farihan Mat Raffei,

Zakaria Zalmiyah,

Shahreen Kasim

Publication year - 2019

Publication title -

indonesian journal of electrical engineering and computer science

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.241

H-Index - 17

eISSN - 2502-4760

pISSN - 2502-4752

DOI - 10.11591/ijeecs.v14.i3.pp1244-1250

Subject(s) - feature selection , computer science , pearson product moment correlation coefficient , dimensionality reduction , classifier (uml) , artificial intelligence , selection (genetic algorithm) , correlation , feature (linguistics) , support vector machine , data mining , machine learning , information retrieval , pattern recognition (psychology) , mathematics , statistics , linguistics , philosophy , geometry

The amount of digital biomedical literature grows that make most of the researchers facing the difficulties to manage and retrieve the required information from the Internet because this task is very challenging. The application of text classification on biomedical literature is one of the solutions in order to solve problem that have been faced by researchers but managing the high dimensionality of data being a common issue on text classification. Therefore, the aim of this research is to compare the techniques that could be used to select the relevant features for classifying biomedical text abstracts. This research focus on Pearson’s Correlation and Information Gain as feature selection techniques for reducing the high dimensionality of data. Towards this effort, we conduct and evaluate several experiments using 100 abstract of stroke documents that retrieved from PubMed database as datasets. This dataset underwent the text pre-processing that is crucial before proceed to feature selection phase. Features selection phase is involving Information Gain and Pearson Correlation technique. Support Vector Machine classifier is used in order to evaluate and compare the effectiveness of two feature selection techniques. For this dataset, Information Gain has outperformed Pearson’s Correlation by 3.3%. This research tends to extract the meaningful features from a subset of stroke documents that can be used for various application especially in diagnose the stroke disease.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore