z-logo
Premium
Utilizing the column imprints to accelerate no‐partitioning hash joins in large‐scale edge systems
Author(s) -
Li Yu,
Xu Wenjian
Publication year - 2021
Publication title -
transactions on emerging telecommunications technologies
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.366
H-Index - 47
ISSN - 2161-3915
DOI - 10.1002/ett.4084
Subject(s) - joins , column (typography) , hash function , computer science , enhanced data rates for gsm evolution , scale (ratio) , parallel computing , artificial intelligence , computer security , geography , computer network , programming language , cartography , frame (networking)
Abstract With the increasing number of edge devices in large‐scale edge systems, more and more data are collected to be processed. In such big data scenarios, there is a resurgence of interest in main‐memory analytic databases because of the large RAM capacity of modern servers and the increasing demand for real‐time analytic platforms. In such databases, join is at the heart of almost every query plan. Join also stays as a time‐consuming operation when the denormalization overhead is too large to be applicable. However, the current implementations of these operations have not fully leveraged the new features (eg, SIMD, multi‐core) provided by the modern hardware. The goal of this article is to design efficient algorithms for joins by judiciously exploiting every bit of RAM and all the available parallelisms in each processing unit. For join operations, hash joins have been studied, improved, and reexamined over decades. In this article, we propose to utilize a secondary index to improve hash joins without the physical partitioning. Specifically, in the build phase, the hash values are scattered evenly into the logical partitions of the hash table; in the probe phase, the secondary index is used as the hints to re‐order the probing sequence, such that the locality of the hash probing is increased. We benchmark the performance of the proposed techniques in our column‐store research prototype. Extensive experiments on the synthetic data and the real data show that our methods offer significant performance improvement over their counterparts.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here