z-logo
open-access-imgOpen Access
An Efficient Data Access Approach With Queue and Stack in Optimized Hybrid Join
Author(s) -
Omer Aziz,
Tayyaba Anees,
Erum Mehmood
Publication year - 2021
Publication title -
ieee access
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.587
H-Index - 127
ISSN - 2169-3536
DOI - 10.1109/access.2021.3064202
Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation
As rapid decision making in business organizations gain in popularity, the complexity and adaptability of extract, transform, and load (ETL) process of near real-time data warehousing has dramatically increased. The most important part of near real-time data warehouse is to feed new data from different data sources on near-real-time basis. However, this new data is not in the format of the data warehouse therefore, it needs to be transformed into the required format by using transformation algorithms which is essential part of ETL process. A semi-stream join algorithm is required to implement this transformation, for this purpose a HYBRIDJOIN (hybrid join) algorithm has been presented in the literature. However, major design issue with this algorithm is that it uses a single buffer to load the disk partitions and therefore, the algorithm has to wait until the next disk partition overwrites the exiting partition in the disk buffer. As the cost of loading disk partition into disk buffer is the major cost of overall algorithm processing cost, this leaves the performance of algorithm sub-optimal. Moreover, existing approaches only considering the oldest key join attributes for finding the matches with master data and maintaining the Queue of key join attribute. However, performance can be improved if recent and oldest attributes process in parallel. This article addresses the limitation of HYBRIDJOIN by presenting two optimized new algorithms named: Parallel-Hybrid Join (P-HYBRIDJOIN) and Hybrid Join with Queue and Stack (QaS-HYBRIDJOIN). Proposed algorithms aim to reduce major processing cost that is disk I/O as well as to increase number of matching stream tuples. Both of these algorithms perform significantly better in terms of throughput and number of matching tuples as compared to existing approaches. Performance analysis and cost model for proposed algorithms show the best performance using intermittent stream data under limited resources.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom