Unit Generation Based on Phrase Break Strength and Pruning for Corpus‐Based Text‐to‐Speech | Zendy

Kim Sanghun | Zendy; Lee Youngjik | Zendy; Hirose Keikichi | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Unit Generation Based on Phrase Break Strength and Pruning for Corpus‐Based Text‐to‐Speech

Author(s) -

Kim Sanghun,

Lee Youngjik,

Hirose Keikichi

Publication year - 2001

Publication title -

etri journal

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.295

H-Index - 46

eISSN - 2233-7326

pISSN - 1225-6463

DOI - 10.4218/etrij.01.0101.0403

Subject(s) - speech synthesis , computer science , pruning , vector quantization , speech recognition , phrase , sentence , cluster analysis , artificial intelligence , reduction (mathematics) , word error rate , set (abstract data type) , natural language processing , mathematics , geometry , agronomy , biology , programming language

This paper discusses two important issues of corpus‐based synthesis: synthesis unit generation based on phrase break strength information and pruning redundant synthesis unit instances. First, the new sentence set for recording was designed to make an efficient synthesis database, reflecting the characteristics of the Korean language. To obtain prosodic context sensitive units, we graded major prosodic phrases into 5 distinctive levels according to pause length and then discriminated intra‐word triphones using the levels. Using the synthesis unit with phrase break strength information, synthetic speech was generated and evaluated subjectively. Second, a new pruning method based on weighted vector quantization (WVQ) was proposed to eliminate redundant synthesis unit instances from the synthesis database. WVQ takes the relative importance of each instance into account when clustering similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective evaluations of synthetic speech quality: one to simply limit the maximum number of instances, and the other based on normal VQ‐based clustering. For the same reduction rate of instance number, the proposed method showed the best performance. The synthetic speech with reduction rate 45% had almost no perceptible degradation as compared to the synthetic speech without instance reduction.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Empowering knowledge with every search

About

About Careers Publisher Partners Contact Us

Learn

FAQs Blog Terms of Use Privacy Policy

About

Learn

Discover

Explore