
A STUDY OF THE LONG‐TERM STABILITY OF GRE GENERAL TEST SCORES
Author(s) -
Wilson Kenneth M.
Publication year - 1988
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2330-8516.1988.tb00295.x
Subject(s) - test (biology) , term (time) , psychology , ethnic group , sample (material) , rank (graph theory) , test score , social psychology , demography , standardized test , mathematics education , mathematics , political science , law , paleontology , physics , chemistry , chromatography , combinatorics , sociology , quantum mechanics , biology
SUMMARY In 1985, GRE Program policy of cautioning users about accepting GRE scores five or more years old was replaced by a policy not to report such scores because of the possibility that older test scores may not adequately represent the current capabilities of the applicants who present them. The present study was undertaken to obtain empirical evidence pertinent to an evaluation of these policies. The study was concerned with questions such as the following: Does the rank ordering of individuals by their test scores change over time? Are there significant average changes, upward or downward, in test performance? Are answers to these questions similar for all three General Test measures (verbal, quantitative, analytical) and/or for different subgroups (e.g., by sex, ethnicity, graduate area)? To be most pertinent for evaluation of policies regarding the treatment of older test scores, answers to these questions should be generalizable to a particular subpopulation, namely, applicants who present “older” test scores. The study employed data from GRE files for more than 15,000 repeating examinees (U.S. citizens only) tested most recently during the 1985‐86 testing year, including 3,614 “long‐term” repeaters–with test‐retest intervals of five or more years. For a variety of reasons, data for the sample of long‐term repeaters proved to be especially relevant for study purposes. First, there is reason to believe that the great majority of long‐term repeaters were applicants who had presented older test scores, and had then been asked to repeat by graduate schools, acting upon a policy of not accepting such scores. Thus, the data in GRE files for long‐term repeaters appear to be authentic test‐retest data for a representative sample of individuals from the population of interest. Second, the initial test means for the long‐term repeaters equaled or slightly exceeded corresponding means for all U.S. examinees tested during 1985–86. Thus, interpretation of observed score gains for long‐term repeaters was not complicated by regression effects or other test‐retest effects that need to be taken into account in evaluating observed gains in low‐scoring samples generally and, especially, samples of examinees who repeat the GRE after relatively brief time intervals. Test‐retest correlations and mean score gains were analyzed in samples of repeaters classified by time between test administrations in intervals ranging from less than six months to 10 or more years. Data for long‐term repeaters (five or more years) were consolidated for analysis of test‐retest correlations and average score gains for subgroups: sex, major area (humanities, social sciences, biosciences, and math‐physical sciences), and ethnic group membership (Asian American, Black, Hispanic, White). Primary emphasis was on the long‐term stability of GRE verbal and quantitative scores (most long‐term repeaters took two different versions of the analytical ability measure with scores that are not directly comparable). Based on trends in test‐retest correlations across the time‐interval samples, there was a relatively high level of stability in the rank ordering of verbal and quantitative test scores over periods of 10 or more years. Test‐retest coefficients of approximately .86 were found in each time‐interval category. However, there were average increases in test performance. Long‐term repeaters registered average gains of 40 points on the verbal measure and 17 points on the quantitative test. The pattern of greater verbal gain than quantitative gain was consistent across subgroups, but there were significant subgroup differences in average amount of gain on both measures. Subgroups were more sharply differentiated by retest scores than they were 5 to 10 years earlier by their initial test scores. Generally speaking, subgroups with higher initial means on a measure registered greater average gains on that measure. In the analysis of gains by major area, quantitative gain varied directly with quantitative emphasis as well as quantitative means–math‐science and bioscience majors registered greater average gains than did social science majors or humanities majors (who showed the lowest quantitative gain). However, major‐area effects were not apparent in the verbal analysis. The two highest‐scoring subgroups–humanities and math‐science majors–registered the largest gains, despite obvious disparities in degree of verbal emphasis. Study findings suggest that during the 5 to 10 or more years that elapsed between their entry into the GRE population (at an average age of about 26 years) and their re‐entry (at an average age of about 32 years), the long‐term repeaters, on the average, experienced real growth in verbal ability and, to a lesser extent, in quantitative ability. This appears to be understandable given certain assumptions about the characteristics of long‐term repeaters and about their careers between test administrations. For example, as individuals who are highly selected in terms of academic and intellectual interests and abilities, long‐term repeaters are likely to engage in interim activities that call for the exercise of relatively high‐level cognitive skills. Most activities make demands on verbal abilities, but quantitative demands are specific to particular fields of endeavor. Individuals identified with more verbally oriented academic fields are unlikely to participate in interim activities that are quantitatively demanding. And, participation in activities that call for the acquisition and use of knowledge, skills, understanding, and other elements from a given ability domain is assumed to be necessary for growth in that ability domain. Findings of larger average verbal than quantitative gains, and field‐effects for quantitative but not verbal gains, are consistent with a set of conditions such as that outlined above. Evidence regarding the intervening activities of long‐term repeaters is needed to evaluate these assumptions, and to identify concomitants of subgroup differences in average gain. The study findings clearly support announced GRE Program policy not to report scores five or more years old. Although only General Test scores were examined in this study, by inference the policy implications are even more applicable to older Subject Test scores–growth or decline in subject‐matter achievement is likely to be sharper. Also, if the evidence and lines of reasoning developed in the study are accepted as supportive of a decision not to report scores that are five or more years old, it follows logically that a policy of caution in accepting scores that are four or even three years old may be as advisable. Further consideration of policies regarding the acceptability of scores less than five years old, as a function of recency, appears to be warranted by study findings, which also suggest the need for further consideration of policies regarding the treatment of multiple test scores generally.