Visual speech synthesis from 3D video | Zendy

James D. Edge | Zendy; Adrian Hilton | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Visual speech synthesis from 3D video

Author(s) -

James D. Edge,

Adrian Hilton

Publication year - 2006

Publication title -

surrey open research repository (university of surrey)

Language(s) - English

Resource type - Conference proceedings

DOI - 10.1049/cp:20061940

Subject(s) - computer science , animation , viseme , speech synthesis , speech recognition , process (computing) , computer animation , path (computing) , artificial intelligence , graph , computer facial animation , computer vision , computer graphics (images) , speech technology , theoretical computer science , programming language

In this paper we describe a parameterisation of lip movements which maintains the dynamic structure inherent in the task of producing speech sounds. A stereo capture system is used to reconstruct 3D models of a speaker producing sentences from the TIMIT corpus. This data is mapped into a space which maintains the relationships between samples and their temporal derivatives. By incorporating dynamic information within the parameterisation of lip movements we can model the cyclical structure, as well as the causal nature of speech movements as described by an underlying visual speech manifold. It is believed that such a structure will be appropriate to various areas of speech modeling, in particular the synthesis of speech lip movements.

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research