A statistical comparison of written language and nonlinguistic symbol systems
Author(s) -
Richard Sproat
Publication year - 2014
Publication title -
language
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.115
H-Index - 76
eISSN - 1535-0665
pISSN - 0097-8507
DOI - 10.1353/lan.2014.0031
Subject(s) - bigram , symbol (formal) , entropy (arrow of time) , computer science , linguistics , psychology , natural language processing , artificial intelligence , cognitive psychology , philosophy , physics , trigram , quantum mechanics
Are statistical methods useful in distinguishing written language from nonlinguistic symbol systems? Some recent articles (Rao et al. 2009a, Lee et al. 2010a) have claimed so. Both of these previous articles use measures based at least in part on bigram conditional entropy, and subsequent work by one of the authors (Rao) has used other entropic measures. In both cases the authors have argued that the methods proposed either are useful for discriminating between linguistic and nonlinguistic systems (Lee et al.), or at least count as evidence of a more ‘inductive’ kind for the status of a system (Rao et al.). Using a larger set of nonlinguistic and comparison linguistic corpora than were used in these and other studies, I show that none of the previously proposed methods are useful as published. However, one of the measures proposed by Lee and colleagues (2010a) (with a different cut-off value) and a novel measure based on repetition turn out to be good measures for classifying symbol systems into the two categories. For the two ancient symbol systems of interest to Rao and colleagues (2009a) and Lee and colleagues (2010a)—Indus Valley inscriptions and Pictish symbols, respectively—both of these measures classify them as nonlinguistic, contradicting the findings of those previous works.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom