z-logo
open-access-imgOpen Access
distance based measure of data quality
Author(s) -
Pavol Kráľ,
Lukáš Sobíšek,
Mária Stachová
Publication year - 2014
Publication title -
metodološki zvezki
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.127
H-Index - 7
eISSN - 1854-0031
pISSN - 1854-0023
DOI - 10.51936/npie9973
Subject(s) - measure (data warehouse) , data mining , computer science , weighting , data quality , quality (philosophy) , set (abstract data type) , data set , ideal (ethics) , simple (philosophy) , function (biology) , artificial intelligence , engineering , medicine , metric (unit) , philosophy , operations management , epistemology , radiology , programming language , evolutionary biology , biology
Data quality can be seen as a very important factor for the validity of information extracted from data sets using statistical or data mining procedures. In the paper we propose a description of data quality allowing us to characterize data quality of the whole data set, as well as data quality of particular variables and individual cases. On the basis of the proposed description, we define a distance based measure of data quality for individual cases as a distance of the cases from the ideal one. Such a measure can be used as additional information for preparation of a training data set, fitting models, decision making based on results of analyses etc. It can be utilized in different ways ranging from a simple weighting function to belief functions.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here