
This that and the other: Multi-word clusters in spoken English as visible patterns of interaction
Author(s) -
Michael McCarthy,
Ronald Carter
Publication year - 2019
Publication title -
teanga - irish association for applied linguistics/teanga the journal of the irish association for applied linguistics
Language(s) - English
Resource type - Journals
eISSN - 2565-6325
pISSN - 0332-205X
DOI - 10.35903/teanga.v21i0.173
Subject(s) - computer science , linguistics , word (group theory) , natural language processing , fluency , vagueness , contrast (vision) , lexis , grammar , vocabulary , sort , artificial intelligence , information retrieval , philosophy , fuzzy logic
This paper investigates multi-word strings automatically retrieved from a 5-million-word corpus of conversational English from Britain and Ireland. Many such strings have neither syntactic nor semantic integrity, for example at the, it was a, what do you. However, many strings display pragmatic integrity, encoding interactive functions such as hedging, vagueness, discourse marking, etc. Examples include and that sort of thing, you know, a couple of. We identify the most common pragmatically integrated clusters and discuss their functions, and compare their frequency with single words, illustrating that many clusters are more frequent than single words accepted as belonging to the core vocabulary of English. The clusters also contrast with the low frequency of opaque idiomatic expressions. High-frequency clusters raise issues around the distinction between lexis and grammar, and support a synthetic view of language production and storage, with implications for the understanding of notions such as fluency and idiomaticity.