Vocal imitation of synthesised sounds varying in pitch, loudness and spectral centroid
Author(s) -
Adib Mehrabi,
Simon Dixon,
M. Sandler
Publication year - 2017
Publication title -
the journal of the acoustical society of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 187
eISSN - 1520-8524
pISSN - 0001-4966
DOI - 10.1121/1.4974825
Subject(s) - imitation , loudness , centroid , speech recognition , computer science , acoustics , feature (linguistics) , natural (archaeology) , psychology , artificial intelligence , linguistics , computer vision , history , physics , social psychology , philosophy , archaeology
Vocal imitations are often used to convey sonic ideas [Lemaitre, Dessein, Susini, and Aura. (2011). Ecol. Psych. 23(4), 267-307]. For computer based systems to interpret these vocalisations, it is advantageous to apply knowledge of what happens when people vocalise sounds where the acoustic features have different temporal envelopes. In the present study, 19 experienced musicians and music producers were asked to imitate 44 sounds with one or two feature envelopes applied. The study addresses two main questions: (1) How accurately can people imitate ramp and modulation envelopes for pitch, loudness, and spectral centroid?; (2) What happens to this accuracy when people are asked to imitate two feature envelopes simultaneously? The results show that experienced musicians can imitate pitch, loudness, and spectral centroid accurately, and that imitation accuracy is generally preserved when the imitated stimuli combine two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom