Premium
Influence of Type of Judge, Normative Information, and Discussion on Standards Recommended for the National Teacher Examinations
Author(s) -
Busch John Christian,
Jaeger Richard M.
Publication year - 1990
Publication title -
journal of educational measurement
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.917
H-Index - 47
eISSN - 1745-3984
pISSN - 0022-0655
DOI - 10.1111/j.1745-3984.1990.tb00739.x
Subject(s) - normative , test (biology) , reliability (semiconductor) , psychology , data collection , medical education , applied psychology , statistics , medicine , political science , mathematics , law , paleontology , power (physics) , physics , quantum mechanics , biology
There are few empirical investigations of the consequences of using widely recommended data collection procedures in conjunction with a specific standardsetting method such as the Angoff (1971) procedure. Such recommendations include the use of several types of judges, the provision of normative information on examinees' test performance, and the opportunity to discuss and reconsider initial recommendations in an iterative standard‐setting procedure. This study of 236 expert judges investigated the effects of using these recommended procedures on (a) average recommended test standards, (b) the variability of recommended test standards, and (c) the reliability of recommended standards for seven subtests of the National Teacher Examinations Communication Skills and General Knowledge Tests. Small, but sometimes statistically significant, changes in mean recommended test standards were observed when judges were allowed to reconsider their initial recommendations following review of normative information and discussion. Means for public school judges changed more than did those for college or university judges. In addition, there was a significant reduction in the within‐group variability of standards recommended for several subtests. Methods for estimating the reliability of recommended test standards proposed by Kane and Wilson (1984) were applied, and their hypothesis of positive covariation between empirical item difficulties and mean recommended standards was confirmed. The data collection procedures examined in this study resulted in substantial increases in the reliability of recommended test standards.