
Developing a systematic approach to assessing data quality in secondary use of clinical data based on intended use
Author(s) -
Razzaghi Hanieh,
Greenberg Jane,
Bailey L. Charles
Publication year - 2022
Publication title -
learning health systems
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.501
H-Index - 9
ISSN - 2379-6146
DOI - 10.1002/lrh2.10264
Subject(s) - computer science , metadata , data science , data quality , completeness (order theory) , context (archaeology) , systematic review , quality (philosophy) , construct (python library) , process (computing) , data mining , information retrieval , medline , world wide web , engineering , metric (unit) , operations management , philosophy , epistemology , law , political science , mathematical analysis , paleontology , mathematics , biology , programming language , operating system
Secondary use of electronic health record (EHR) data for research requires that the data are fit for use . Data quality (DQ) frameworks have traditionally focused on structural conformance and completeness of clinical data extracted from source systems. In this paper, we propose a framework for evaluating semantic DQ that will allow researchers to evaluate fitness for use prior to analyses. Methods We reviewed current DQ literature, as well as experience from recent multisite network studies, and identified gaps in the literature and current practice. Derived principles were used to construct the conceptual framework with attention to both analytic fitness and informatics practice. Results We developed a systematic framework that guides researchers in assessing whether a data source is fit for use for their intended study or project. It combines tools for evaluating clinical context with DQ principles, as well as factoring in the characteristics of the data source, in order to develop semantic DQ checks. Conclusions Our framework provides a systematic process for DQ development. Further work is needed to codify practices and metadata around both structural and semantic data quality.