Efficient analytics on ordered datasets using MapReduce | Zendy

Jiangtao Yin | Zendy; Yong Liao | Zendy; Mario Baldi | Zendy; Lixin Gao | Zendy; Antonio Nucci | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Efficient analytics on ordered datasets using MapReduce

Author(s) -

Jiangtao Yin,

Yong Liao,

Mario Baldi,

Lixin Gao,

Antonio Nucci

Publication year - 2013

Publication title -

citeseer x (the pennsylvania state university)

Language(s) - English

Resource type - Conference proceedings

ISBN - 978-1-4503-1910-2

DOI - 10.1145/2462902.2462930

Subject(s) - computer science , merge (version control) , speedup , analytics , big data , workload , business intelligence , sort , execution time , database transaction , data mining , database , parallel computing , operating system

Efficiently analyzing data on a large scale can be vital for data owners to gain useful business intelligence. One of the most common datasets used to gain business intelligence is event log files. Oftentimes, records in event log files that are time sorted, need to be grouped by user ID or transaction ID in order to mine user behaviors, such as click through rate, while preserving the time order. This kind of analytical workload is here referred to as RElative Order-pReserving based Grouping (Re-Org). Using MapReduce/Hadoop, a popular big data analysis tool, in an as-is manner for executing Re-Org tasks on ordered datasets is not efficient due to its internal sort-merge mechanism. We propose a framework that adopts an efficient group-order-merge mechanism to provide faster execution of Re-Org tasks and implement it by extending Hadoop. Experimental results show a 2.2x speedup over executing Re-Org tasks in plain vanilla Hadoop.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research