RPK-table based efficient algorithm for join-aggregate query on MapReduce
Author(s) -
Zhan Li,
Qi Feng,
Wei Chen,
Tengjiao Wang
Publication year - 2016
Publication title -
caai transactions on intelligence technology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.613
H-Index - 15
eISSN - 2468-6557
pISSN - 2468-2322
DOI - 10.1016/j.trit.2016.03.008
Subject(s) - computer science , join (topology) , aggregate (composite) , key (lock) , overhead (engineering) , table (database) , query optimization , online aggregation , process (computing) , data mining , database , materialized view , sargable , distributed computing , view , information retrieval , web search query , database design , search engine , operating system , mathematics , materials science , combinatorics , composite material
Join-aggregate is an important and widely used operation in database system. However, it is time-consuming to process join-aggregate query in big data environment, especially on MapReduce framework. The main bottlenecks contain two aspects: lots of I/O caused by temporary data and heavy communication overhead between different data nodes during query processing. To overcome such disadvantages, we design a data structure called Reference Primary Key table (RPK-table) which stores the relationship of primary key and foreign key between tables. Based on this structure, we propose an improved algorithm on MapReduce framework for join-aggregate query. Experiments on TPC-H dataset demonstrate that our algorithm outperforms existing methods in terms of communication cost and query response time
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom