z-logo
Premium
Renewable estimation and incremental inference in generalized linear models with streaming data sets
Author(s) -
Luo Lan,
Song Peter X.K.
Publication year - 2020
Publication title -
journal of the royal statistical society: series b (statistical methodology)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 6.523
H-Index - 137
eISSN - 1467-9868
pISSN - 1369-7412
DOI - 10.1111/rssb.12352
Subject(s) - inference , estimator , computer science , consistency (knowledge bases) , statistical inference , wald test , data mining , algorithm , statistical hypothesis testing , mathematics , statistics , artificial intelligence
Summary The paper presents an incremental updating algorithm to analyse streaming data sets using generalized linear models. The method proposed is formulated within a new framework of renewable estimation and incremental inference, in which the maximum likelihood estimator is renewed with current data and summary statistics of historical data. Our framework can be implemented within a popular distributed computing environment, known as Apache Spark, to scale up computation. Consisting of two data‐processing layers, the rho architecture enables us to accommodate inference‐related statistics and to facilitate sequential updating of the statistics used in both estimation and inference. We establish estimation consistency and asymptotic normality of the proposed renewable estimator, in which the Wald test is utilized for an incremental inference. Our methods are examined and illustrated by various numerical examples from both simulation experiments and a real world data analysis.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here