Preservation of Knowledge- Data processing in the Danish Data Archives
Author(s) -
Anne Sofie Fink Kjeldgaard,
Søren Priisholm,
Birgitte Grønlund Jensen
Publication year - 2005
Publication title -
iassist quarterly
Language(s) - English
Resource type - Journals
eISSN - 2331-4141
pISSN - 0739-1137
DOI - 10.29173/iq865
Subject(s) - danish , computer science , data processing , data science , information retrieval , database , philosophy , linguistics
High quality secondary analysis of sample surveys depends on the quality of the primary data sets and their preservation. The researchers performing the secondary analysis must be able to access as much information about the data sets as possible. In the Danish Data Archives (DDA) great effort is taken to preserve the data sets in a way that meets the needs of the secondary researcher. For this reason data processing is a core operation in the DDA and great importance is attached to producing reliable and useful documentation of the preserved data files. Data processing is a core activity in the Danish Data Archives1 (DDA). It is the performance of data processing that makes DDA unique in comparison with alternative data preservation efforts in the Danish research world. Despite the importance attached to the activity of data processing, it is an activity that is invisible to outsiders. In this article we set out to discuss the advantages of data processing for the depositor of the data set, for the end-users when performing secondary analysis on the data set and for the data archive. In this light we will then describe our preservation strategy in detail. Finally, we will guide readers through the data processing process step by step and point to future development. The Advantages of Data Processing Data processing has advantages for the depositors, the end-users and the data archive. These advantages can be traced back to the effort of collecting and integrating all available information about a data set. The physical product of the data processing process in the DDA is a Data Documentation Publication (DDP) consisting of a study description and a codebook with frequency tables for all variables in the data set. The study description holds all information about the creation of the data set and facts about its preservation in the DDA, as well as restrictions for access to the data set. The codebook consists of all available documentation about the data file and a copy of the original questionnaire. Data Deposition Although it ought to be a straightforward task to add together the information for the study description and the codebook, most often it is a timeconsuming job. It becomes especially difficult if it has been a while since the actual survey was carried out. In order to make it as easy as possible to gather information about the study, the DDA urges researchers to deposit their data as early as possible in the research process. After a data set has gone through data processing a Data Documentation Publication (DDP) is created. The depositor gets the message that her study has a DDP, i.e., a study description and a codebook. She can then be sure that her data set is preserved for the future and she will have no need for storing the data elsewhere. Data Location As soon as the data material has a study description it becomes searchable in the DDA̓ s data catalogue on the Internet. A future challenge in this respect is to allow users not only to search the study description, but also to search the whole DDP. At the moment, the DDA is taking part in the MADIERA project, which, among other things, will offer users this opportunity. Secondary Analysis As regards data analysis, data processing offers essential advantages for end-users. First and foremost, it becomes straightforward to get access to all information about the origin of the data set and its contents because all information is at hand in the DDP. However the DDP is not just a collection of information provided by the depositor. During the data processing process, several additions, standardisations and checks are made. For example, if divergence between the data and the questionnaire is found, the person in charge of the data processing will make a comment about this in the codebook, thereby eliminating the need for future users of this particular data set to spend time finding “wild codes” themselves. Data Archiving Internally, data processing has the advantage that users of the data sets seldom need guidance upon having received a data set. As a consequence, the time spent on providing user services is reduced in spite of increasing use of our data sets over the years.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom