
MODEL DIAGNOSTICS FOR BAYESIAN NETWORKS
Author(s) -
Sinharay Sandip
Publication year - 2004
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.2004.tb01944.x
Subject(s) - guttman scale , bayesian statistics , bayesian probability , statistics , statistic , equivalence (formal languages) , set (abstract data type) , bayesian network , bayesian average , raw score , artificial intelligence , posterior probability , bayesian inference , computer science , mathematics , raw data , discrete mathematics , programming language
Assessing fit of psychometric models has always been an issue of enormous interest, but there exists no unanimously agreed upon item fit diagnostic for the models. Bayesian networks, frequently used in educational assessments (see, for example, Mislevy, Almond, Yan, & Steinberg, 2001) primarily for learning about students' knowledge and skills, are no exception. This paper employs the posterior predictive model checking method (Guttman, 1967; Rubin, 1984), a popular Bayesian model checking tool, to assess fit of simple Bayesian networks. A number of aspects of model fit, those of usual interest to practitioners, are assessed in this paper using various diagnostic tools. The first diagnostic used is direct data display—a visual comparison of the observed data set and a number of the posterior predictive data sets (that are predicted by the model). The second aspect examined here is item fit. Examinees are grouped into a number of equivalence classes, based on the generated values of their skill variables, and the observed and expected proportion correct scores on an item for the classes are combined to provide a χ 2 ‐type and a G 2 ‐type test statistic for each item. Another (similar) set of χ 2 ‐type and G 2 ‐type test statistic is obtained by grouping the examinees by their raw scores and then comparing their observed and expected proportion correct scores on an item. This paper also suggests how to obtain posterior predictive p‐values, natural candidate p‐values from a Bayesian viewpoint, for the χ 2 ‐type and G 2 ‐type test statistics. The paper further examines the association among the items, especially if the model can explain the odds ratios corresponding to the responses to pairs of items. Finally, in an effort to examine the issue of differential item functioning (DIF), this paper suggests a version of the Mantel‐Haenszel statistic (Holland, 1985), which uses “matched groups” based on equivalence classes, as a discrepancy measure with posterior predictive model checking. Limited simulation studies and a real data application examine the effectiveness of the suggested model diagnostics.