
ANOTHER LOOK AT INTER‐RATER AGREEMENT
Author(s) -
Zwick Rebecca
Publication year - 1986
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2330-8516.1986.tb00189.x
Subject(s) - homogeneity (statistics) , homogeneous , agreement , statistics , mathematics , inter rater reliability , psychology , econometrics , combinatorics , rating scale , philosophy , linguistics
Most currently used measures of inter‐rater agreement for the nominal case incorporate a correction for ‘chance agreement.’ The definition of chance agreement is not the same for all coefficients, however. Three chance‐corrected coefficients are Cohen's κ, Scott's Π, and the S index of Bennett, Goldstein and Alpert, which has reappeared in many guises. For all three measures, chance is defined to include independence between raters. Scott's Π involves a further assumption of homogeneous rater marginals under chance. For the S coefficient, uniform marginals for both raters under chance are assumed. Because of these disparate formulations, κ, Π, and S can lead to different conclusions about rater agreement. Consideration of the properties of these measures leads to the recommendation that a test of marginal homogeneity be conducted as a first step in the assessment of rater agreement. Rejection of the hypothesis of homogeneity is sufficient to conclude that agreement is poor. If the homogeneity hypothesis is retained, Π can be used as an index of agreement.