
AN EMPIRICAL STUDY OF THE BROAD RANGE TAILORED TEST OF VERBAL ABILITY
Author(s) -
Kreitzberg Charles B.,
Jones Douglas H.
Publication year - 1980
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.1980.tb01195.x
Subject(s) - test (biology) , reliability (semiconductor) , psychology , computerized adaptive testing , range (aeronautics) , selection (genetic algorithm) , statistics , computer science , clinical psychology , psychometrics , artificial intelligence , mathematics , engineering , paleontology , power (physics) , physics , quantum mechanics , biology , aerospace engineering
This document is the final report of a study designed to investigate the performance of the Broad‐Range Tailored Test of Verbal Ability. The Broad‐Range Tailored Test (BRTT) is a computerized adaptive test developed by Frederic M. Lord. It employs a maximum likelihood selection strategy to choose items from an item pool stored on magnetic disk. The items selected are tailored to the individual testee and are presented on a computer terminal. Each testee responds to 25 items; at the conclusion of the test the computer calculates a verbal ability score for the individual. The test was designed to yield a verbal ability score from the fifth grade level to the graduate school level. Performance of the BRTT had been investigated by means of simulation studies. The current study is the first empirical test of its performance. Two forms of the BRTT were administered to 146 high school students. The students also answered a posttest questionnaire in which they indicated their reactions to this form of testing. Analyses revealed that the BRTT was more reliable than the PSAT for a number of scores derived from the data. The test‐retest reliability of the BRTT was .8719 at the 25th item; reliability of the PSAT verbal score (scaled down to 25 items) was .65. Analyses of the reliability of the BRTT vs. the PSAT revealed that the tailored test was also more reliable than the conventional test at shorter lengths. Correlations between scores on the BRTT and PSAT were reasonably high—typically about .86. This finding confirms theoretical expectation regarding the increased efficiency of adaptive, as compared to conventional tests. The study investigated nine of observed scores and score transformations. The most useful of these was found to be the expected proportion correct over the entire item pool. This score was highly reliable and was found to be parallel with respect to the mean values across forms A and B. θ, the commonly‐employed latent‐trait parameter and Ω, a monotone transformation of θ, did not exhibit this characteristic. The information functions of the BRTT were calculated and compared favorably with simulation results previously reported by Lord. Thus the accuracy of the BRTT was in accord with theoretical expectation. Student response to the computerized testing procedure was generally quite favorable. Students found the human‐computer interface easy to use and less fatiguing than a long pencil‐and‐paper test. Operationally, the system performed well. Detailed analysis of 11 anomalous cases, suggested refinements to the system. Response time was adequate and consistent. Reliability of the hardware and software was excellent. These results suggest that computerized adaptive testing is ready to take the first steps out of the laboratory environment and find its place in the educational community. The recommendations emerging from this study are: (1) that the organization collaborate with an interested client to develop an adaptive test for use in an educational setting; (2) that the potential for microprocessor‐based systems for the delivery of adaptive testing be evaluated; (3) that extensions to item response theory and the development of alternative models for the provision of adaptive testing be explored; and (4) that high priority be accorded the development of innovative assessment strategies for computer presentation. Such items might involve simulation and gaming, constructed responses, graphics, motion, sound, and time‐dependent responses.