
WEIGHTING TEST SAMPLES IN IRT LINKING AND EQUATING: TOWARD AN IMPROVED SAMPLING DESIGN FOR COMPLEX EQUATING
Author(s) -
Qian Jiahe,
Jiang Yanming,
Davier Alina A.
Publication year - 2013
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.2013.tb02346.x
Subject(s) - equating , weighting , statistics , mathematics , item response theory , econometrics , sampling (signal processing) , sample (material) , population , sample size determination , psychometrics , demography , computer science , rasch model , medicine , chemistry , filter (signal processing) , chromatography , computer vision , radiology , sociology
Several factors could cause variability in item response theory (IRT) linking and equating procedures, such as the variability across examinee samples and/or test items, seasonality, regional differences, native language diversity, gender, and other demographic variables. Hence, the following question arises: Is it possible to select optimal samples of examinees so that the IRT linking and equating can be more precise at an administration level as well as over a large number of administrations? This is a question of optimal sampling design in linking and equating. To obtain an improved sampling design for invariant linking and equating across testing administrations, we applied weighting techniques to yield a weighted sample distribution that is consistent with the target population distribution. The goal is to obtain a stable Stocking‐Lord test characteristic curve (TCC) linking and a true‐score equating that is invariant across administrations. To study the weighting effects on linking, we first selected multiple subsamples from a data set. We then compared the linking parameters from subsamples with those from the data and examined whether the linking parameters from the weighted sample yielded smaller mean square errors (MSE) than those from the unweighted subsample. To study the weighting effects on true‐score equating, we also compared the distributions of the equated scores. Generally, the findings were that the weighting produced good results.