z-logo
open-access-imgOpen Access
EFFECT OF RASCH CALIBRATION ON ABILITY AND DIF ESTIMATION IN COMPUTER‐ADAPTIVE TESTS
Author(s) -
Zwick Rebecca,
Thayer Dorothy T.,
Wingersky Marilyn
Publication year - 1994
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.1994.tb01605.x
Subject(s) - rasch model , differential item functioning , statistics , computerized adaptive testing , item response theory , polytomous rasch model , range (aeronautics) , mathematics , psychology , calibration , econometrics , psychometrics , materials science , composite material
A simulation study of methods of assessing differential item functioning (DIF) in computer‐adaptive tests (CATs) was conducted by Zwick, Thayer and Wingersky (in press; 1993). Results showed that modified versions of the Mantel‐Haenszel and standardization methods work well with CAT data. In that study, data were generated using the three‐parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, 3PL item response data were used, but the Rasch model was assumed in obtaining item parameter estimates, which, in turn, determined the information table to be used in the item selection algorithm. New Rasch‐based expected true scores were obtained for each examinee, based on responses to the CAT items. As in the previous study, the DIF statistics were highly correlated with the generating DIF, and the means and standard deviations of these statistics across items were close to their nominal values. There was, however, a tendency for DIF statistics to be slightly smaller in magnitude than in the 3PL analysis, resulting in a lower probability of detecting items with extreme DIF. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch‐based CAT tended to be biased downward, particularly for lower‐ability examinees. Unlike the Rasch CAT scores, Rasch expected true scores based on nonadaptive administration of all pool items behaved quite well, as did the nonadaptive and CAT‐based expected true scores obtained using the 3PL model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here