An Efficient Data Access Approach With Queue and Stack in Optimized Hybrid Join | Zendy

Omer Aziz | Zendy; Tayyaba Anees | Zendy; Erum Mehmood | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

An Efficient Data Access Approach With Queue and Stack in Optimized Hybrid Join

Author(s) -

Omer Aziz,

Tayyaba Anees,

Erum Mehmood

Publication year - 2021

Publication title -

ieee access

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.587

H-Index - 127

ISSN - 2169-3536

DOI - 10.1109/access.2021.3064202

Subject(s) - aerospace , bioengineering , communication, networking and broadcast technologies , components, circuits, devices and systems , computing and processing , engineered materials, dielectrics and plasmas , engineering profession , fields, waves and electromagnetics , general topics for engineers , geoscience , nuclear engineering , photonics and electrooptics , power, energy and industry applications , robotics and control systems , signal processing and analysis , transportation

As rapid decision making in business organizations gain in popularity, the complexity and adaptability of extract, transform, and load (ETL) process of near real-time data warehousing has dramatically increased. The most important part of near real-time data warehouse is to feed new data from different data sources on near-real-time basis. However, this new data is not in the format of the data warehouse therefore, it needs to be transformed into the required format by using transformation algorithms which is essential part of ETL process. A semi-stream join algorithm is required to implement this transformation, for this purpose a HYBRIDJOIN (hybrid join) algorithm has been presented in the literature. However, major design issue with this algorithm is that it uses a single buffer to load the disk partitions and therefore, the algorithm has to wait until the next disk partition overwrites the exiting partition in the disk buffer. As the cost of loading disk partition into disk buffer is the major cost of overall algorithm processing cost, this leaves the performance of algorithm sub-optimal. Moreover, existing approaches only considering the oldest key join attributes for finding the matches with master data and maintaining the Queue of key join attribute. However, performance can be improved if recent and oldest attributes process in parallel. This article addresses the limitation of HYBRIDJOIN by presenting two optimized new algorithms named: Parallel-Hybrid Join (P-HYBRIDJOIN) and Hybrid Join with Queue and Stack (QaS-HYBRIDJOIN). Proposed algorithms aim to reduce major processing cost that is disk I/O as well as to increase number of matching stream tuples. Both of these algorithms perform significantly better in terms of throughput and number of matching tuples as compared to existing approaches. Performance analysis and cost model for proposed algorithms show the best performance using intermittent stream data under limited resources.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research