Outlier Reduction using Hybrid Approach in Data Mining
Author(s) -
Nancy Lekhi,
Manish Mahajan
Publication year - 2015
Publication title -
international journal of modern education and computer science
Language(s) - English
Resource type - Journals
eISSN - 2075-017X
pISSN - 2075-0161
DOI - 10.5815/ijmecs.2015.05.06
Subject(s) - outlier , computer science , anomaly detection , data mining , cluster analysis , artificial neural network , pattern recognition (psychology) , k means clustering , data set , set (abstract data type) , reduction (mathematics) , artificial intelligence , cluster (spacecraft) , mathematics , geometry , programming language
The Outlier detection is very active area of research in data mining where outlier is a mismatched data in dataset with respect to the other available data. In existing approaches the outlier detection done only on numeric dataset. For outlier detection if we use clustering method , then they mainly focus on those elements as outliers which are lying outside the clusters but it may possible that some of the unknown elements with any possible reasons became the part of the cluster so we have to concentrate on that also. The Proposed method uses hybrid approach to reduce the number of outliers. The number of outlier can only reduce by improving the cluster formulation method. The proposed method uses two data mining techniques for cluster formulation i.e. weighted k-means and neural network where weighted k- means is the clustering technique that can apply on text and date data set as well as numeric data set. Weighted k- means assign the weights to each element in dataset. The output of weighted k-means becomes the input for neural network where the neural network is the classification and clustering technique of data mining. Training is provided to the neural network and according to that neurons performed the testing. The neural network test the cluster formulated by weighted k-means to ensure that the clusters formulated by weighted k-means are group accordingly. There is lots of outlier detection methods present in data mining. The proposed method use Integrating Semantic Knowledge (SOF) for outlier detection. This method detects the semantic outlier where the semantic outlier is a data point that behaves differently with other data points in the same class or cluster. The main motive of this research work is to reduce the number of outliers by improving the cluster formulation methods so that outlier rate reduces and also to decrease the mean square error and improve the accuracy. The simulation result clearly shows that proposed method works pretty well as it significantly reduces the outlier. Index Terms—Data Mining, Clustering, Weighted K- means, Neural Network, Outlier, and SOF
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom