Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization | Zendy

Zekai J. Gao | Zendy; Niketan Pansare | Zendy; Christopher Jermaine | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization

Author(s) -

Zekai J. Gao,

Niketan Pansare,

Christopher Jermaine

Publication year - 2018

Publication title -

ieee transactions on knowledge and data engineering

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 1.36

H-Index - 174

eISSN - 1558-2191

pISSN - 1041-4347

DOI - 10.1109/tkde.2018.2873325

Subject(s) - computer science , join (topology) , parameterized complexity , object (grammar) , context (archaeology) , set (abstract data type) , scale (ratio) , theoretical computer science , data mining , artificial intelligence , programming language , algorithm , mathematics , paleontology , physics , combinatorics , quantum mechanics , biology

Large-scale optimization has become an important application for data management systems, particularly in the context of statistical machine learning. In this paper, we consider how one might implement the join-and-co-group pattern in the context of a fully declarative data processing system. The join-and-co-group pattern is ubiquitous in iterative, large-scale optimization. In the join-and-co-group pattern, a user-defined function $g$g is parameterized with a data object $x$x as well as the subset of the statistical model $\Theta _x$Θx that applies to that object, so that $g(x | \Theta _x)$g(x|Θx) can be used to compute a partial update of the model. This is repeated for every $x$x in the full data set $X$X. All partial updates are then aggregated and used to perform a complete update of the model. The join-and-co-group pattern has several implementation challenges, including the potential for a massive blow-up in the size of a fully parameterized model. Thus, unless the correct physical execution plan be chosen for implementing the join-and-co-group pattern, it is easily possible to have an execution that takes a very long time or even fails to complete. In this paper, we carefully consider the alternatives for implementing the join-and-co-group pattern on top of a declarative system, as well as how the best alternative can be selected automatically. Our focus is on the SimSQL database system, which is an SQL-based system with special facilities for large-scale, iterative optimization. Since it is an SQL-based system with a query optimizer, those choices can be made automatically.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research