EFFECT OF RASCH CALIBRATION ON ABILITY AND DIF ESTIMATION IN COMPUTER‐ADAPTIVE TESTS | Zendy

Zwick Rebecca | Zendy; Thayer Dorothy T. | Zendy; Wingersky Marilyn | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

EFFECT OF RASCH CALIBRATION ON ABILITY AND DIF ESTIMATION IN COMPUTER‐ADAPTIVE TESTS

Author(s) -

Zwick Rebecca,

Thayer Dorothy T.,

Wingersky Marilyn

Publication year - 1994

Publication title -

ets research report series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.235

H-Index - 5

ISSN - 2330-8516

DOI - 10.1002/j.2333-8504.1994.tb01605.x

Subject(s) - rasch model , differential item functioning , statistics , computerized adaptive testing , item response theory , polytomous rasch model , range (aeronautics) , mathematics , psychology , calibration , econometrics , psychometrics , materials science , composite material

A simulation study of methods of assessing differential item functioning (DIF) in computer‐adaptive tests (CATs) was conducted by Zwick, Thayer and Wingersky (in press; 1993). Results showed that modified versions of the Mantel‐Haenszel and standardization methods work well with CAT data. In that study, data were generated using the three‐parameter logistic (3PL) model and this same model was assumed in obtaining item parameter estimates. In the current study, 3PL item response data were used, but the Rasch model was assumed in obtaining item parameter estimates, which, in turn, determined the information table to be used in the item selection algorithm. New Rasch‐based expected true scores were obtained for each examinee, based on responses to the CAT items. As in the previous study, the DIF statistics were highly correlated with the generating DIF, and the means and standard deviations of these statistics across items were close to their nominal values. There was, however, a tendency for DIF statistics to be slightly smaller in magnitude than in the 3PL analysis, resulting in a lower probability of detecting items with extreme DIF. This reduced sensitivity appeared to be related to a degradation in the accuracy of matching. Expected true scores from the Rasch‐based CAT tended to be biased downward, particularly for lower‐ability examinees. Unlike the Rasch CAT scores, Rasch expected true scores based on nonadaptive administration of all pool items behaved quite well, as did the nonadaptive and CAT‐based expected true scores obtained using the 3PL model.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore