z-logo
open-access-imgOpen Access
User-Oriented Approach to Data Quality Evaluation
Author(s) -
Anastasija Nikiforova,
Jānis Bičevskis,
Zane Bičevska,
Ivo Odītis
Publication year - 2020
Publication title -
jucs - journal of universal computer science
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.284
H-Index - 53
eISSN - 0948-695X
pISSN - 0948-6968
DOI - 10.3897/jucs.2020.007
Subject(s) - computer science , executable , data quality , data model (gis) , quality (philosophy) , data mining , data modeling , database , object (grammar) , programming language , engineering , artificial intelligence , metric (unit) , philosophy , operations management , epistemology
The paper proposes a new data object-driven approach to data quality evaluation. It consists of three main components: (1) a data object, (2) data quality requirements, and (3) data quality evaluation process. As data quality is of relative nature, the data object and quality requirements are (a) use-case dependent and (b) defined by the user in accordance with his needs. All three components of the presented data quality model are described using graphical Domain Specific Languages (DSLs). In accordance with Model-Driven Architecture (MDA), the data quality model is built in two steps: (1) creating a platform-independent model (PIM), and (2) converting the created PIM into a platform-specific model (PSM). The PIM comprises informal specifications of data quality. The PSM describes the implementation of a data quality model, thus making it executable, enabling data object scanning and detecting data quality defects and anomalies. The proposed approach was applied to open data sets, analysing their quality. At least 3 advantages were highlighted: (1) a graphical data quality model allows the definition of data quality by non-IT and non-data quality experts as the presented diagrams are easy to read, create and modify, (2) the data quality model allows an analysis of "third-party" data without deeper knowledge on how the data were accrued and processed, (3) the quality of the data can be described at least at two levels of abstraction - informally using natural language or formally by including executable artefacts such as SQL statements.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom