z-logo
open-access-imgOpen Access
Luster Sampling to Improve Classifier Accuracy for Numeric Data
Author(s) -
Dr Lakshmi Sreenivasa Reddy D,
M. Rajini
Publication year - 2019
Publication title -
international journal of recent technology and engineering
Language(s) - English
Resource type - Journals
ISSN - 2277-3878
DOI - 10.35940/ijrte.b2848.078219
Subject(s) - categorical variable , computer science , classifier (uml) , cluster analysis , data mining , sampling (signal processing) , cluster sampling , artificial intelligence , pattern recognition (psychology) , machine learning , population , demography , filter (signal processing) , sociology , computer vision
Clustering is one of the essential techniques to group similar data. Improving model accuracy is still a challenge for all variety of data. Training and testing a classifier on entire data is not possible for large scale of data. Sampling of the data is necessary for any modeling and is an important aspect in data mining. All models train and test on different samples taken by traditional techniques like random forest ensemble method. In this paper, we propose cluster sampling which is superior to any other sampling methods in improving classifier accuracy. Sampling the data from usual methods cannot cover all variety of data from the original. Cluster sampling is a two-step approach. First it clusters the entire data, second it selects samples from each cluster. These samples consists all verity of data with equal proportion. Cluster sampling leverages the tree based ensemble to handle categorical, numerical and mixed type of data. Classifiers modeled on cluster sampling samples shown superior in accuracy than modeled on other sampling techniques.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here