A Novel Approach for Data Cleaning by Selecting the Optimal Data to Fill the Missing Values for Maintaining Reliable Data Warehouse
Author(s) -
Raju Dara,
Ch. Satyanarayana,
A. Govardhan
Publication year - 2016
Publication title -
international journal of modern education and computer science
Language(s) - English
Resource type - Journals
eISSN - 2075-017X
pISSN - 2075-0161
DOI - 10.5815/ijmecs.2016.05.08
Subject(s) - computer science , byte , data warehouse , jaccard index , quality (philosophy) , data mining , cluster analysis , function (biology) , upgrade , data quality , information retrieval , pipeline (software) , haystack , data science , database , world wide web , artificial intelligence , metric (unit) , philosophy , operations management , epistemology , evolutionary biology , biology , economics , programming language , operating system
At present trillion of bytes of information is being created by projects particularly in web. To accomplish the best choice for business benefits, access to that information in a very much arranged and intuitive way is dependably a fantasy of business administrators and chiefs. Information warehouse is the main feasible arrangement that can bring the fantasy into reality. The upgrade of future attempts to settle on choices relies on upon the accessibility of right data that depends on nature of information basic. The quality information must be created by cleaning information preceding stacking into information distribution center following the information gathered from diverse sources will be grimy. Once the information have been pre-prepared and purified then it produces exact results on applying the information mining question. There are numerous cases where the data is sparse in nature. To get accurate results with sparse data is hard. In this paper the main goal is to fill the missing values in acquired data which is sparse in nature. Precisely caution must be taken to choose minimum number of text pieces to fill the holes for which we have used Jaccard Dissimilarity function for clustering the data which is frequent in nature.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom