Supervised compression of big data | Zendy

Joseph V. Roshan | Zendy; Mak Simon | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Premium

Supervised compression of big data

Author(s) -

Joseph V. Roshan,

Mak Simon

Publication year - 2021

Publication title -

statistical analysis and data mining: the asa data science journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.381

H-Index - 33

eISSN - 1932-1872

pISSN - 1932-1864

DOI - 10.1002/sam.11508

Subject(s) - computer science , data mining , key (lock) , big data , reduction (mathematics) , parametric statistics , range (aeronautics) , machine learning , parametric model , nonparametric statistics , data compression , data modeling , artificial intelligence , statistics , mathematics , geometry , computer security , database , materials science , composite material

Abstract The phenomenon of big data has become ubiquitous in nearly all disciplines, from science to engineering. A key challenge is the use of such data for fitting statistical and machine learning models, which can incur high computational and storage costs. One solution is to perform model fitting on a carefully selected subset of the data. Various data reduction methods have been proposed in the literature, ranging from random subsampling to optimal experimental design‐based methods. However, when the goal is to learn the underlying input–output relationship, such reduction methods may not be ideal, since it does not make use of information contained in the output. To this end, we propose a supervised data compression method called supercompress , which integrates output information by sampling data from regions most important for modeling the desired input–output relationship. An advantage of supercompress is that it is nonparametric—the compression method does not rely on parametric modeling assumptions between inputs and output. As a result, the proposed method is robust to a wide range of modeling choices. We demonstrate the usefulness of supercompress over existing data reduction methods, in both simulations and a taxicab predictive modeling application.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore