Premium
A review and analysis of research on the test–retest reliability of professional judgment
Author(s) -
Ashton Robert H.
Publication year - 2000
Publication title -
journal of behavioral decision making
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.136
H-Index - 76
eISSN - 1099-0771
pISSN - 0894-3257
DOI - 10.1002/1099-0771(200007/09)13:3<277::aid-bdm350>3.0.co;2-b
Subject(s) - reliability (semiconductor) , psychology , test (biology) , consistency (knowledge bases) , stability (learning theory) , internal consistency , quality (philosophy) , social psychology , cognitive psychology , session (web analytics) , applied psychology , computer science , epistemology , psychometrics , artificial intelligence , developmental psychology , machine learning , paleontology , power (physics) , philosophy , physics , quantum mechanics , world wide web , biology
This paper analyzes existing research on the test–retest reliability of human judgment, i.e. the extent to which a judge makes identical judgments when presented with identical stimuli on two occasions. Only research involving professional judges who make experimental judgments in a reasonable analog of their everyday experience is included. Studies of both internal consistency reliability and temporal stability reliability are analyzed (where the former refers to the inclusion of repeat stimuli in the same experimental session, and the latter refers to the repeating of the experimental task from a few days to several months later). It is found that (1) the test–retest reliability literature is concentrated in four substantive judgment areas (medicine/psychology, meteorology, human resources management, and business), (2) the literature is extremely variable in terms of research approach/design, the determinants or correlates of test–retest reliability that have been studied, and the quality of the execution and analysis, and (3) mean test–retest reliability differs across both substantive judgment areas and the internal consistency versus temporal stability distinction. An inescapable conclusion from the analysis is that our knowledge of this fundamental property of human judgment is quite meager. Therefore, the paper concludes with suggestions about future research that would address test–retest reliability more systematically. Copyright © 2000 John Wiley & Sons, Ltd.