Comparison of speech spectra for additive type spectral distortion
Author(s) -
B. Yegnanarayana,
D. Raj Reddy
Publication year - 1978
Publication title -
the journal of the acoustical society of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 187
eISSN - 1520-8524
pISSN - 0001-4966
DOI - 10.1121/1.2016829
Subject(s) - speech recognition , bandlimiting , quantization (signal processing) , computer science , mathematics , spectral envelope , white noise , spectral shape analysis , distortion (music) , speech coding , cepstrum , noise (video) , algorithm , spectral line , bandwidth (computing) , artificial intelligence , physics , statistics , fourier transform , telecommunications , mathematical analysis , amplifier , astronomy , image (mathematics)
Parameters representing smoothed spectral characteristics of short segments of speech are often used as features in speech processing systems. The main pattern recognition problem in speech is matching the test spectrum with the reference spectrum. In this paper we show that the matching methods usually adopted do not yield a true measure of the actual differences in the envelopes of spectra. This is particularly true for additive type of noise degradation in speech. Two types of such degradation namely, the quantization noise of waveform encoding and additive bandlimited white noise, are considered for illustration. We show that the parameters, linear prediction coefficients or cepstral coefficients, do not represent the true spectral envelope information of the distorted signal, which explains the discrepancy among various distance measures based on these parameters. We propose a more practical approach which involves transforming one spectrum relative to the other to bring both of them to the same level of dynamic range before any comparison is made between them. The main result of this study is that quantization distortion of ADPCM speech is not very significant even at low bit rates, whereas additive white noise is deleterious even for high signal to noise ratio. This result explains to some extent the good recognition capability of Harpy speech recognition system for ADPCM speech even for the lowest bit rate [B. Yegnanarayana and D. Raj Reddy, J. Acoust. Soc. Am. 62, S27 (A)].
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom