Open Access
A New Paradigm for Development of Data Imputation Approach for Missing Value Estimation
Author(s) -
G. Madhu,
G. Nagachandrika
Publication year - 2016
Publication title -
international journal of electrical and computer engineering
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.277
H-Index - 22
ISSN - 2088-8708
DOI - 10.11591/ijece.v6i6.pp3222-3228
Subject(s) - missing data , imputation (statistics) , computer science , data mining , centroid , curse of dimensionality , cluster analysis , artificial intelligence , machine learning
Many real-world applications encountered a common issue in data analysis is the presence of missing data value and challenging task in many applications such as wireless sensor networks, medical applications and psychological domain and others. Learning and prediction in the presence of missing value can be treacherous in machine learning, data mining and statistical analysis. A missing value can signify important information about dataset in the mining process. Handling missing data value is a challenging task for the data mining process. In this paper, we propose new paradigm for the development of data imputation method for missing data value estimation based on centroids and the nearest neighbours. Firstly, identify clusters based on the k-means algorithm and calculate centroids and the nearest neighbour data records. Secondly, the nearest distances from complete dataset as well as incomplete dataset from the centroids and estimated the nearest data record which tends to be curse dimensionality. Finally, impute the missing value based nearest neighbour record using statistical measure called z-score. The experimental study demonstrates strengthen of the proposed paradigm for the imputation of the missing data value estimation in dataset. Tests have been run using different types of datasets in order to validate our approach and compare the results with other imputation methods such as KNNI, SVMI, WKNNI, KMI and FKNNI. The proposed approach is geared towards maximizing the utility of imputation with respect to missing data value estimation.