Premium
Inferactive data analysis
Author(s) -
Bi Nan,
Markovic Jelena,
Xia Lucy,
Taylor Jonathan
Publication year - 2020
Publication title -
scandinavian journal of statistics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.359
H-Index - 65
eISSN - 1467-9469
pISSN - 0303-6898
DOI - 10.1111/sjos.12425
Subject(s) - inference , exploratory data analysis , computer science , statistic , fiducial inference , bayesian probability , data mining , test statistic , statistical inference , mathematics , sampling distribution , frequentist inference , bayesian inference , machine learning , statistical hypothesis testing , artificial intelligence , statistics
Abstract We describe inferactive data analysis , so‐named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey's exploratory and confirmatory data analysis allowing also for Bayesian data analysis. We see this as a useful step in concrete providing tools (with statistical guarantees) for current data scientists. The basis of inference we use is (a conditional approach to) selective inference , in particular its randomized form. The relevant reference distributions are constructed from what we call a DAG‐DAG—a Data Analysis Generative DAG, and a selective change of variables formula is crucial to any practical implementation of inferactive data analysis via sampling these distributions. We discuss a canonical example of an incomplete cross‐validation test statistic to discriminate between black box models, and a real HIV dataset example to illustrate inference after making multiple queries on data.