z-logo
Premium
Prediction of true test scores from observed item scores and ancillary data
Author(s) -
Haberman Shelby J.,
Yao Lili,
Sinharay Sandip
Publication year - 2015
Publication title -
british journal of mathematical and statistical psychology
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 3.157
H-Index - 51
eISSN - 2044-8317
pISSN - 0007-1102
DOI - 10.1111/bmsp.12052
Subject(s) - statistics , test (biology) , item response theory , mathematics , econometrics , psychometrics , paleontology , biology
In many educational tests which involve constructed responses, a traditional test score is obtained by adding together item scores obtained through holistic scoring by trained human raters. For example, this practice was used until 2008 in the case of GRE ® G eneral A nalytical W riting and until 2009 in the case of TOEFL ® i BT W riting. With use of natural language processing, it is possible to obtain additional information concerning item responses from computer programs such as e‐rater ® . In addition, available information relevant to examinee performance may include scores on related tests. We suggest application of standard results from classical test theory to the available data to obtain best linear predictors of true traditional test scores. In performing such analysis, we require estimation of variances and covariances of measurement errors, a task which can be quite difficult in the case of tests with limited numbers of items and with multiple measurements per item. As a consequence, a new estimation method is suggested based on samples of examinees who have taken an assessment more than once. Such samples are typically not random samples of the general population of examinees, so that we apply statistical adjustment methods to obtain the needed estimated variances and covariances of measurement errors. To examine practical implications of the suggested methods of analysis, applications are made to GRE G eneral A nalytical W riting and TOEFL i BT W riting. Results obtained indicate that substantial improvements are possible both in terms of reliability of scoring and in terms of assessment reliability.

This content is not available in your region!

Continue researching here.

Having issues? You can contact us here