Data linkage for querying heterogeneous databases
Author(s) -
Mohammed Gollapalli
Publication year - 2013
Publication title -
queensland's institutional digital repository (the university of queensland)
Language(s) - English
Resource type - Dissertations/theses
DOI - 10.14264/uql.2016.476
Subject(s) - computer science , tuple , data mining , cluster analysis , semantic heterogeneity , schema (genetic algorithms) , information retrieval , schema matching , data integration , scalability , matching (statistics) , database , ontology based data integration , artificial intelligence , semantic web , statistics , mathematics , discrete mathematics
Data Linkage is an important step that can provide valuable insights for evidence-based decision making, especially for crucial events. Performing sensible queries across heterogeneous databases containing millions of records is a complex task that requires a complete understanding of each contributing database’s schema to define the structure of its information. The key aim is to approximate the structure and content of the induced data into a concise synopsis in order to extract and link meaningful facts. Current techniques primarily focus on performing pair-wise attribute matching and pay little attention in discovering direct and weighted cluster correlations for linking semantic equivalent datasets. We identify such problems as four major research issues in Data Linkage: associated costs in pair-wise matching, record matching overheads, semantic flow of information restrictions, and single order classification limitations. In this doctorial dissertation, we introduce a new multi-faceted classification technique for performing structural analysis on knowledge domain clusters, using a novel Ontology Guided Data Linkage (OGDL) framework. In order to support self-organization of contributing databases through the discovery of structural dependencies, we introduce a series of algorithms for performing multi-level exploitation of ontological domain knowledge relating to tables, attributes and tuples. These techniques are of great help for automating the discovery of schema structures across multiple databases, based on the use of direct and weighted correlations between different ontological concepts, using a novel h-gram (hash gram) record matching technique for concept clustering and cluster mapping. Moreover, through a set of accuracy, performance and scalability experimental tests run on real-world datasets, we demonstrate the feasibility of our OGDL algorithms and show that our framework runs in polynomial time and performs well in practice. Data Linkage is an important enabling technology in eHealth as linked data is a cost effective approach towards advancing research outcomes into health policies, detect any adverse drug reactions, reduce costs, and uncover any non-practices within the health system. Hence, to illustrate the efficiency and effectiveness of OGDL in real-world applications, we comprehensively used clinical risk management domain as our practical example. For this reason, we further extended our OGDL framework and introduced a composite clinical risk management success indicator data linkage, which consists of clinical risk factors combined with clinical resource and intervention factors that have shown to be as-
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom