z-logo
open-access-imgOpen Access
Optimization of Skewed Data Using Sampling-Based Preprocessing Approach
Author(s) -
Sushruta Mishra,
Pradeep Kumar Mallick,
Lambodar Jena,
GyooSoo Chae
Publication year - 2020
Publication title -
frontiers in public health
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.908
H-Index - 41
ISSN - 2296-2565
DOI - 10.3389/fpubh.2020.00274
Subject(s) - computer science , preprocessor , data pre processing , data mining , resampling , sampling (signal processing) , data classification , machine learning , class (philosophy) , artificial intelligence , filter (signal processing) , computer vision
In the past few years, classification has undergone some major evolution. With a constant surge of the amount of data gathered from different sources, efficient processing and analysis of data is becoming difficult. Due to the uneven distribution of data among classes, data classification with machine-learning techniques has become more tedious. While most algorithms focus on major data samples, they ignore the minor class data. Thus, the data-skewing issue is one of the critical problems that need attention of researchers. The paper stresses upon data preprocessing using sampling techniques to overcome the data-skewing problem. Here, three different sampling techniques such as Resampling, SpreadSubSampling, and SMOTE are implemented to reduce this uneven data distribution issue and classified with the K-nearest neighbor algorithm. The performance of classification is evaluated with various performance metrics to determine the efficiency of classification.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom