A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions | Zendy

Fu Jianbin | Zendy; Feng Yuling | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions

Author(s) -

Fu Jianbin,

Feng Yuling

Publication year - 2018

Publication title -

ets research report series

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.235

H-Index - 5

ISSN - 2330-8516

DOI - 10.1002/ets2.12194

Subject(s) - raw score , statistics , test (biology) , factor analysis , correlation , mathematics , curse of dimensionality , reliability (semiconductor) , aggregate (composite) , test score , factor regression model , econometrics , regression analysis , raw data , standardized test , proper linear model , paleontology , power (physics) , physics , geometry , materials science , polynomial regression , quantum mechanics , composite material , biology

In this study, we propose aggregating test scores with unidimensional within‐test structure and multidimensional across‐test structure based on a 2‐level, 1‐factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1‐factor model based on the correlation matrix of test raw scores (M2), overall ability from a unidimensional generalized partial credit model (GPCM) based on the items from all tests (M3), average of ability estimates from individual tests based on GPCM (M4), regression factor score of the 1‐factor model based on the correlation matrix of ability estimates from individual tests based on GPCM (M5), and general ability from the testlet model (M6). The 4 design factors considered in the simulation study are ability correlation between tests (.3, .5, .7, .8, and .9), test length (10, 20, 30, and 60 items), number of tests (2 and 4), and factor loading distribution (equal and unequal). The comparisons are also conducted on a real test data set with 2 tests. On the basis of the results, M1 and M4 are recommended for 2 tests, and M2, M5, and M6 are recommended for 3 or more tests. Several issues regarding attaining aggregate score reliability for intended uses and score aggregation types distinguished by test dimensionality are discussed, and practical suggestions for score aggregation are provided.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore