z-logo
open-access-imgOpen Access
Handling OOVWords in Mandarin Spoken Term Detection with an Hierarchical n‐Gram Language Model
Author(s) -
Wang Xuyang,
Zhang Pengyuan,
Na Xingyu,
Pan Jielin,
Yan Yonghong
Publication year - 2017
Publication title -
chinese journal of electronics
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.267
H-Index - 25
eISSN - 2075-5597
pISSN - 1022-4653
DOI - 10.1049/cje.2017.07.004
Subject(s) - mandarin chinese , term (time) , n gram , computer science , spoken language , gram , natural language processing , language model , linguistics , speech recognition , artificial intelligence , philosophy , physics , quantum mechanics , bacteria , biology , genetics
In this paper, an hierarchical n ‐gram Language model (LM) combining words and characters is explored to improve the detection of Out‐of‐vocabulary (OOV) words in Mandarin Spoken term detection (STD). The hierarchical LM is based on a word‐level LM, with a character‐level LM estimating probabilities of OOV words in a class‐based way. The region containing OOV words in the sentence to be decoded is detected with the help of the word‐level LM and the probabilities of OOV words are derived from the character‐level LM. The implementation of the proposed approach is based on a dynamic decoder. The proposed approach is evaluated in terms of Actual term weighted value (ATWV) on two Mandarin data sets. Experiment results show that more than 10% relative improvement for OOV word detection is achieved on both sets. In addition, the detection of In‐vocabulary (IV) words is barely influenced as well.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.
Having issues? You can contact us here