z-logo
open-access-imgOpen Access
Landmark detection for distinctive feature-based speech recognition
Author(s) -
Sharlene A. Liu
Publication year - 1994
Publication title -
the journal of the acoustical society of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 187
eISSN - 1520-8524
pISSN - 0001-4966
DOI - 10.1121/1.411152
Subject(s) - landmark , computer science , salient , speech recognition , utterance , feature (linguistics) , word error rate , waveform , pattern recognition (psychology) , artificial intelligence , detector , linguistics , telecommunications , philosophy , radar
This thesis is a component of a proposed knowledge-based speech recognition system which uses landmarks to guide the search for distinctive features. In an utterance, landmarks identify localized regions where the acoustic manifestations of the linguistically-motivated distinctive features are most salient. This thesis describes an algorithm for automatically detecting landmarks associated with segments having abrupt acoustics. As a consequence of landmark detection, the algorithm also provides hypotheses about the underlying broad phonetic class at each landmark. The algorithm is hierarchically-structured, and is rooted in linguistic and speech production theory. It uses several factors to detect landmarks: energy abruptness in five frequency bands and at two levels of temporal resolution, segmental duration, specific broad phonetic class constraints, and articulatory constraints. Landmark detection experiments were performed on clean speech (including TIMIT), speech in noise, and telephone speech. On clean speech, the landmark detector performed relatively well, with a detection rate of about 90% if correct landmark type was required, and 94% if correct landmark type was not required. The insertion rate was 6%-9%. An analysis of the temporal precision of the landmark detector showed that a large majority of the landmarks were detected within 20 ms of the landmark transcription, and almost all were within 30 ms. For either speech in noise or telephone speech, performance understandably degraded due to the reduced information content in the speech signal. For each set of experiments, the landmark detection algorithm was manually customized to the database using knowledge about speech and the operating environment. One consequence of this knowledge-driven approach is that there is no degradation in performance between what is typically called the "training" data set and the test data set. This approach also allows careful evaluation and further improvements to be made in a methodical manner. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here
Accelerating Research

Address

John Eccles House
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom