
A Comparison of Score Aggregation Methods for Unidimensional Tests on Different Dimensions
Author(s) -
Fu Jianbin,
Feng Yuling
Publication year - 2018
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/ets2.12194
Subject(s) - raw score , statistics , test (biology) , factor analysis , correlation , mathematics , curse of dimensionality , reliability (semiconductor) , aggregate (composite) , test score , factor regression model , econometrics , regression analysis , raw data , standardized test , proper linear model , paleontology , power (physics) , physics , geometry , materials science , polynomial regression , quantum mechanics , composite material , biology
In this study, we propose aggregating test scores with unidimensional within‐test structure and multidimensional across‐test structure based on a 2‐level, 1‐factor model. In particular, we compare 6 score aggregation methods: average of standardized test raw scores (M1), regression factor score estimate of the 1‐factor model based on the correlation matrix of test raw scores (M2), overall ability from a unidimensional generalized partial credit model (GPCM) based on the items from all tests (M3), average of ability estimates from individual tests based on GPCM (M4), regression factor score of the 1‐factor model based on the correlation matrix of ability estimates from individual tests based on GPCM (M5), and general ability from the testlet model (M6). The 4 design factors considered in the simulation study are ability correlation between tests (.3, .5, .7, .8, and .9), test length (10, 20, 30, and 60 items), number of tests (2 and 4), and factor loading distribution (equal and unequal). The comparisons are also conducted on a real test data set with 2 tests. On the basis of the results, M1 and M4 are recommended for 2 tests, and M2, M5, and M6 are recommended for 3 or more tests. Several issues regarding attaining aggregate score reliability for intended uses and score aggregation types distinguished by test dimensionality are discussed, and practical suggestions for score aggregation are provided.