Feature Selection using Genetic Algorithm for Clustering high Dimensional Data | Zendy

K. Kouser | Zendy; Amrita Priyam | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Feature Selection using Genetic Algorithm for Clustering high Dimensional Data

Author(s) -

K. Kouser,

Amrita Priyam

Publication year - 2018

Publication title -

international journal of engineering and technology

Language(s) - English

Resource type - Journals

ISSN - 2227-524X

DOI - 10.14419/ijet.v7i2.11.11001

Subject(s) - cluster analysis , feature (linguistics) , pattern recognition (psychology) , feature selection , clustering high dimensional data , algorithm , subspace topology , feature vector , genetic algorithm , set (abstract data type) , computer science , data mining , data set , canopy clustering algorithm , correlation clustering , cure data clustering algorithm , artificial intelligence , mathematics , machine learning , programming language , philosophy , linguistics

One of the open problems of modern data mining is clustering high dimensional data. For this in the paper a new technique called GA-HDClustering is proposed, which works in two steps. First a GA-based feature selection algorithm is designed to determine the optimal feature subset; an optimal feature subset is consisting of important features of the entire data set next, a K-means algorithm is applied using the optimal feature subset to find the clusters. On the other hand, traditional K-means algorithm is applied on the full dimensional feature space. Finally, the result of GA-HDClustering is compared with the traditional clustering algorithm. For comparison different validity matrices such as Sum of squared error (SSE), Within Group average distance (WGAD), Between group distance (BGD), Davies-Bouldin index(DBI), are used .The GA-HDClustering uses genetic algorithm for searching an effective feature subspace in a large feature space. This large feature space is made of all dimensions of the data set. The experiment performed on the standard data set revealed that the GA-HDClustering is superior to traditional clustering algorithm.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore