Premium
Assessing Bayesian Semi‐Parametric Log‐Linear Models: An Application to Disclosure Risk Estimation
Author(s) -
Carota Cinzia,
Filippone Maurizio,
Polettini Silvia
Publication year - 2022
Publication title -
international statistical review
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.051
H-Index - 54
eISSN - 1751-5823
pISSN - 0306-7734
DOI - 10.1111/insr.12471
Subject(s) - computer science , contingency table , context (archaeology) , model selection , bayesian probability , dirichlet process , data mining , parametric statistics , bayesian information criterion , econometrics , statistics , machine learning , mathematics , artificial intelligence , paleontology , biology
Summary We propose a method for identifying models with good predictive performance in the family of Bayesian log‐linear mixed models with Dirichlet process random effects for count data. Their wide applicability makes the assessment of model performance crucial in many fields, including disclosure risk estimation, which is the focus of the present work. Rather than assessing models on the whole contingency table, we target the specific objective of the analysis and propose a two‐stage model selection procedure aimed at limiting a form of bias arising in the process of model selection. Our proposal combines two different criteria: at the first stage, a path in the model search space is identified through a strongly penalized log‐likelihood; at the second, a small number of semi‐parametric models is evaluated through a context‐dependent score‐based information criterion. Tested on a variety of contingency tables, our method proves to be able to identify models with good predictive performance in a few steps, even in the presence of large tables with many sampling and structural zeros. We carefully discuss the proposed method in the context of the literature on model assessment and contextualize the illustrative application in the recent debate on statistical disclosure limitation. Finally, we provide examples of further applications in different research areas.