A distributed kernel summation framework for general‐dimension machine learning | Zendy

Lee Dongryeol | Zendy; Sao Piyush | Zendy; Vuduc Richard | Zendy; Gray Alexander G. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

A distributed kernel summation framework for general‐dimension machine learning

Author(s) -

Lee Dongryeol,

Sao Piyush,

Vuduc Richard,

Gray Alexander G.

Publication year - 2014

Publication title -

statistical analysis and data mining: the asa data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 33

eISSN - 1932-1872

pISSN - 1932-1864

DOI - 10.1002/sam.11207

Subject(s) - computer science , scalability , kernel (algebra) , theoretical computer science , parallel computing , bottleneck , distributed memory , kernel method , computation , dimension (graph theory) , algorithm , shared memory , machine learning , support vector machine , mathematics , combinatorics , database , pure mathematics , embedded system

Kernel summations are a ubiquitous key computational bottleneck in many data analysis methods. In this paper, we attempt to marry, for the first time, the best relevant techniques in parallel computing, where kernel summations are in low dimensions, with the best general‐dimension algorithms from the machine learning literature. We provide the first distributed implementation of kernel summation framework that can utilize: (i) various types of deterministic and probabilistic approximations that may be suitable for low and high‐dimensional problems with a large number of data points; (ii) any multidimensional binary tree using both distributed memory and shared memory parallelism; and (iii) a dynamic load balancing scheme to adjust work imbalances during the computation. Our hybrid message passing interface (MPI)/OpenMP codebase has wide applicability in providing a general framework to accelerate the computation of many popular machine learning methods. Our experiments show scalability results for kernel density estimation on a synthetic ten‐dimensional dataset containing over one billion points and a subset of the Sloan Digital Sky Survey Data up to 6144 cores. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore