
A BAYESIAN HIERARCHICAL MODEL FOR LARGE‐SCALE EDUCATIONAL SURVEYS: AN APPLICATION TO THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS
Author(s) -
Johnson Matthew S.,
Jenkins Frank
Publication year - 2004
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.2004.tb01965.x
Subject(s) - markov chain monte carlo , bayesian probability , computer science , bayesian statistics , sampling (signal processing) , sample (material) , scale (ratio) , point estimation , hierarchical database model , gibbs sampling , bayesian hierarchical modeling , statistics , multilevel model , set (abstract data type) , bayesian inference , sample size determination , posterior probability , data mining , mathematics , machine learning , artificial intelligence , chemistry , physics , filter (signal processing) , chromatography , quantum mechanics , computer vision , programming language
Large‐scale educational assessments such as the National Assessment of Educational Progress (NAEP) sample examinees to whom an exam will be administered. In most situations the sampling design is not a simple random sample and must be accounted for in the estimating model. After reviewing the current operational estimation procedure for NAEP, this paper describes a Bayesian hierarchical model for the analysis of complex large‐scale assessments. The model clusters students within schools and schools within primary sampling units. The paper discusses an estimation procedure that utilizes a Markov chain Monte Carlo algorithm to approximate the posterior distribution of the model parameters. Results from two Bayesian models, one treating item parameters as known and one treating them as unknown, are compared to results from the current operational method on a simulated data set and on a subset of data from the 1998 NAEP reading assessment. The point estimates from the Bayesian model and the operational method are quite similar in most cases, but there does seem to be systematic differences in measures of uncertainty (e.g., standard errors, confidence intervals).