
THE PREDICTION OF TOEFL LISTENING COMPREHENSION ITEM DIFFICULTY FOR MINITALK PASSAGES: IMPLICATIONS FOR CONSTRUCT VALIDITY
Author(s) -
Freedle Roy,
Kostin Irene
Publication year - 1996
Publication title -
ets research report series
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.235
H-Index - 5
ISSN - 2330-8516
DOI - 10.1002/j.2333-8504.1996.tb01707.x
Subject(s) - test of english as a foreign language , psychology , active listening , inference , reading comprehension , construct (python library) , test (biology) , comprehension , construct validity , cognitive psychology , linguistics , sample (material) , natural language processing , reading (process) , computer science , language assessment , developmental psychology , artificial intelligence , psychometrics , mathematics education , communication , paleontology , philosophy , biology , programming language , chemistry , chromatography
The purpose of the current study was to predict the difficulty of a large sample ( n = 337) of Test of English as a Foreign Language (TOEFL ® ) listening comprehension items that dealt with the minitalk (listening) passages. Four item types were examined: Main Idea items (consisting of two subtypes: explicit and implicit gist), Supporting Idea items (also called explicit detail items), and two types of Inference items (one subtype called pure inference items and a second subtype called inference‐application items – both of these subtypes are also called implicit detail items). A related purpose was to examine whether particular types of predictors (i.e., text and text‐associated variables) play a significant role in predicting item difficulty. We maintain that evidence favoring construct validity in part requires significant contributions from these text and text‐associated predictor variables. This paper also explores the hypothesis that multiple‐choice listening comprehension tests are sensitive to many sentential and discourse variables found to influence comprehension processes in the experimental language comprehension literature. Earlier work with reading items (Freedle & Kostin, 1993) and the current study of listening (minitalk) items show that the majority of sentential and discourse variables identified in our review of the experimental language literature were significantly related to item difficulty within TOEFL's multiple‐choice format. Furthermore, contrary to predictions that we attributed to several critics of multiple‐choice tests, the pattern of correlational results showed that there was a significant relationship between item difficulty and the text and text‐related variables. We Interpreted this pattern of results as modest evidence supporting our claim that multiple‐choice TOEFL listening and reading items yield measures that are consistent with one definition of a construct valid test of comprehension. That is, since critics have pointed out that at least reading items can often be correctly responded to without the need to read the text passage, item variables (and not text or text‐related variables) should be the most prominent predictors (correlationally as well as in a regression sense) of item difficulty. Since the contrary relationship was in fact found in several analyses of the minitalk items, it was concluded that this outcome provides some modest evidence favoring the construct validity of the minitalk passages and their associated items. Various stepwise and hierarchical regression analyses showed that many of these text and text‐related variables provide independent contributions in predicting listening Item difficulty. More specifically, apart from the correlational results, the following stepwise linear regressions results were obtained. For the full sample of 337 listening items (containing all four item types) with equated delta (an index of item difficulty) as the dependent variable, we found 35% ( p < .0001) of the variance of listening item difficulty could be accounted for by 14 variables. Eleven of the 14 variables reflected significant and independent contributions due to text and text/item overlap variables. (The remaining three reflected the contribution of item type, which we argue is a category distinct from a pure item variable.) By implication, this result provides limited evidence favoring the construct validity of the TOEFL listening comprehension items. Hierarchical regression analysis of this sample also provided additional evidence that we interpreted as consistent with a construct valid test. Alternative approaches to examining construct validity issues are briefly described. Further analyses explore the distinction between nested and nonnested data sets. A nonnested data set consists of listening passages where each passage has a single multiple‐choice item associated with it. However, a nested data set consists of listening passages where each passage typically has two or more multiple‐choice items associated with it. The results – consistent with previous results (Freedle & Kostin, 1993) for TOEFL's reading items – showed that this distinction appears to have important consequences for the level of item predictability for listening contexts. Regression analyses of nonnested data sets for the TOEFL listening (minitalk) items were restricted to individual item types (e.g., just inference items or just supporting idea items were analyzed). Results indicated that higher predictability was obtainable with nonnested data sets in comparison with nested data sets when analyses were restricted to data sets consisting of a single item type. In general, the cumulative evidence suggested that the distinction between nested and nonnested data sets (also see Freedle & Kostin, 1993, pp. 163–165) still represents an important methodological point. Numerous comparisons are made between significant predictors of TOEFL's reading items and TOEFL's listening (minitalk) items. Some suggestions are made to explain the similarities and differences found between TOEFL reading and listening (minitalk) sections. In conclusion, we demonstrate the following: (1) listening as well as reading item difficulty can be significantly predicted by variables similar to those reported in the experimental literature on language comprehension, (2) some similarities and differences between listening and reading comprehension abilities can be studied using the methodology employed in the current study, and (3) the TOEFL listening (minitalk) items examined here appear to be construct valid in the restricted sense that we have demonstrated that variables that code for features of the whole passage and selected aspects of the passage significantly influenced item difficulty; this pattern of results implies, We believe, that examinees were significantly influenced by relevant listening information presented in the passage and were actively attempting to comprehend the listening materials.