Premium
Parallel rule‐based selective sampling and on‐demand learning to rank
Author(s) -
Freitas Mateus F.,
Sousa Daniel X.,
Martins Wellington S.,
Rosa Thierson C.,
Silva Rodrigo M.,
Gonçalves Marcos A.
Publication year - 2018
Publication title -
concurrency and computation: practice and experience
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.309
H-Index - 67
eISSN - 1532-0634
pISSN - 1532-0626
DOI - 10.1002/cpe.4464
Subject(s) - computer science , scalability , speedup , ranking (information retrieval) , rank (graph theory) , exploit , learning to rank , baseline (sea) , implementation , parallel computing , parallelism (grammar) , machine learning , artificial intelligence , database , programming language , mathematics , combinatorics , oceanography , computer security , geology
Summary Learning to rank (L2R) works by constructing a ranking model from training data so that, given a new query, the model is able to generate an effective rank of the objects for the query. Almost all work in L2R focus on ranking accuracy leaving performance and scalability overlooked. However, performance is a critical factor, especially when dealing with on‐demand queries. In this scenario, Learning to Rank using association rules has been shown to be extremely effective but only at a high computational cost. In this work, we show how to exploit parallelism on rule‐based systems to: i) drastically reduce L2R training datasets using selective sampling and ii) to generate query customized ranking models on the fly. We present parallel algorithms and GPU implementations for these two tasks showing that dataset reduction takes only a few seconds with speedups up to 148 x over a serial baseline, and that queries can be processed in only a few milliseconds with speedups of 1000 x over a serial baseline and 29 x over a parallel baseline for the best case. We also extend the implementations to work with multiple GPUs, further increasing the speedup over the baselines and showing the scalability of our proposed algorithms.