Premium
The statistical analysis of acoustic phonetic data: exploring differences between spoken Romance languages
Author(s) -
Pigoli Davide,
Hadjipantelis Pantelis Z.,
Coleman John S.,
Aston John A. D.
Publication year - 2018
Publication title -
journal of the royal statistical society: series c (applied statistics)
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 1.205
H-Index - 72
eISSN - 1467-9876
pISSN - 0035-9254
DOI - 10.1111/rssc.12258
Subject(s) - spectrogram , sound change , computer science , feature (linguistics) , speech recognition , variation (astronomy) , contrast (vision) , covariance , representation (politics) , transformation (genetics) , linguistics , natural language processing , artificial intelligence , mathematics , physics , biochemistry , chemistry , politics , astrophysics , political science , law , gene , philosophy , statistics
Summary The historical and geographical spread from older to more modern languages has long been studied by examining textual changes and in terms of changes in phonetic transcriptions. However, it is more difficult to analyse language change from an acoustic point of view, although this is usually the dominant mode of transmission. We propose a novel analysis approach for acoustic phonetic data, where the aim will be to model the acoustic properties of spoken words statistically. We explore phonetic variation and change by using a time–frequency representation, namely the log‐spectrograms of speech recordings. We identify time and frequency covariance functions as a feature of the language; in contrast, mean spectrograms depend mostly on the particular word that has been uttered. We build models for the mean and covariances (taking into account the restrictions placed on the statistical analysis of such objects) and use these to define a phonetic transformation that models how an individual speaker would sound in a different language, allowing the exploration of phonetic differences between languages. Finally, we map back these transformations to the domain of sound recordings, enabling us to listen to the output of the statistical analysis. The approach proposed is demonstrated by using recordings of the words corresponding to the numbers from 1 to 10 as pronounced by speakers from five different Romance languages.