Premium
Design
Author(s) -
Sonja Durr,
Eric Galey,
Ginelle Hustrulid,
P. Colin Manikoth,
Travis Masingale
Publication year - 1980
Publication title -
acta psychiatrica scandinavica
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 2.849
H-Index - 146
eISSN - 1600-0447
pISSN - 0001-690X
DOI - 10.1111/j.1600-0447.1980.tb10228.x
Subject(s) - citation , computer science , psychology , information retrieval , library science
Design In general, the variance of a set of ratings can be apportioned between a component reflecting the heterogeneity of the subjects, and another component representing differences among the raters. The intraclass correlation (R) measures the proportion of the total variance contributed by the subjects. R can be interpreted as a measure of inter-rater reliability as follows (cf., Bartko & Carpenter (1976)). If there is no rater disagreement, then the total variance is contributed by the subjects; in which case, R = 1. Hence, R = 1 means that inter-rater reliability is perfect. If, on the other hand, disagreements between raters constitute a portion of the total variance, then R < 1. The greater the share of the total variance ascribed to rater disagreement, the smaller the value of R. A decrease in the value of R therefore reflects a corresponding decline in the inter-rater reliability. At the extreme, R = 0 indicates that all the variance results from disagreements between raters, in which case inter-rater reliability is very poor. Following the analysis of variance procedures developed by Fleirs (1%3) for the balanced incomplete-blocks design, we used the ratings for each item of interest to compute a point estimate (a) of the intraclass correlation, as well as the corresponding 95 % confidence interval (RL, 1.0) for R. Each selected item in the ISPI thus gfve us a separate value of a and RL to be used to measure the reliability of that item. R is the well-known intraclass correlation coefficient (Fleirs (1%3)). The confidence interval for R can be used as a test of reliablility by setting a lower bound & on R, so that only values of R greater than R,, are interpreted as indexing acceptable inter-rater reliability. By choosing R,, = 0.7, we can be assured that at least 49% of the variance of the ratings is attributable to subject, not rater, variance (W.H.O. (1973, p. 117)). Furthermore, if RL > R,, = 0.7, the hypothesis of unreliability (i.e., R < 0.7) can be rejected. Because each subject was rated by half of the students in the morning and the other half in the afternoon, our study design was a composite of two balanced incompleteblocks design: one for the morning, and one for the afternoon. We were, therefore, able to compute and RL twice for each item: once using the morning interviews, and again using the afternoon data. Recall that the same student interviewers were used in both the morning and afternoon studies, even though each student rated every subject only once (in either the morning or the afternoon, but never in both).