AN INVESTIGATION OF THE USE OF SIMPLIFIED IRT MODELS FOR SCALING AND EQUATING THE TOEFL TEST | Zendy

Way Walter D. | Zendy; Reese Clyde M. | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

AN INVESTIGATION OF THE USE OF SIMPLIFIED IRT MODELS FOR SCALING AND EQUATING THE TOEFL TEST

Author(s) -

Way Walter D.,

Reese Clyde M.

Publication year - 1990

Publication title -

ets research report series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.235

H-Index - 5

ISSN - 2330-8516

DOI - 10.1002/j.2333-8504.1990.tb01365.x

Subject(s) - equating , scaling , statistics , test of english as a foreign language , sample size determination , mathematics , item response theory , sample (material) , econometrics , rasch model , psychometrics , language education , chemistry , geometry , mathematics education , chromatography

The purpose of this study was to explore the use of two alternative item response theory estimation models in the scaling and equating of TOEFL – a modified one‐parameter model (M1PL) and a modified two‐parameter model (M2PL) – and to compare item scaling and test equating results based on these two alternative models with results based on the three‐parameter model (3PL) that is currently being used to scale and equate the TOEFL. The study employed a design in which a typical TOEFL equating was simulated using artificial data. The simulated equatings were compared in terms of correlations between estimated and generating parameters, model‐data fit, and concordance of simulated score conversions with conversions based on the generating parameters. The results of the study clearly indicated that the 3PL model performed better than the M1PL and M2PL models on the basis of each of the evaluation criteria. There was also evidence that the M2PL model performed better than the M1PL model, particularly in terms of model‐data fit and in the weighted root mean square difference statistics used to evaluate the simulated score conversions. The results of the study also indicated that discrepancies between score conversions based on the M1PL and M2PL model and those based on the 3PL model tended to occur at the lower and upper ends of the score scales. Finally, the results of the study for the 3PL model indicated that while correlations between item parameter estimates and generating parameters tended to be affected by sample size, neither the quality of model‐data fit nor the quality of simulated equatings appeared to be sensitive to the different sample sizes used in the study.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore