Adjusting the Frame: Biphasic Performative Control of Speech Rhythm
Author(s) -
Samuel Delalez,
Christophe d’Alessandro
Publication year - 2017
Publication title -
interspeech 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.21437/interspeech.2017-396
Subject(s) - rhythm , speech recognition , computer science , gesture , syllable , intonation (linguistics) , prosody , acoustics , artificial intelligence , linguistics , physics , philosophy
Performative time and pitch scaling is a new research paradigm for prosodic analysis by synthesis. In this paper, a system for real-time recorded speech time and pitch scaling by the means of hands or feet gestures is designed and evaluated. Pitch is controlled with the preferred hand, using a stylus on a graphic tablet. Time is controlled using rhythmic frames, or constriction gestures, defined by pairs of control points. The ”Arsis” corresponds to the constriction (weak beat of the syllable) and the ”Thesis” corresponds to the vocalic nucleus (strong beat of the syllable). This biphasic control of rhythmic units is performed by the non-preferred hand using a button. Pitch and time scales are modified according to these gestural controls with the help of a real-time pitch synchronous overlap-add technique (RT-PSOLA). Rhythm and pitch control accuracy are assessed in a prosodic imitation experiment: the task is to reproduce intonation and rhythm of various sentences. The results show that inter-vocalic durations differ on average of only 20 ms. The system appears as a new and effective tool for performative speech and singing synthesis. Consequences and applications in speech prosody research are discussed.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom