Closed-set speaker conditioned acoustic-to-articulatory inversion using bi-directional long short term memory network
Author(s) -
Aravind Illa,
Prasanta Ghosh
Publication year - 2020
Publication title -
the journal of the acoustical society of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 187
eISSN - 1520-8524
pISSN - 0001-4966
DOI - 10.1121/10.0000738
Subject(s) - speech recognition , inversion (geology) , pooling , computer science , artificial neural network , acoustics , artificial intelligence , geology , paleontology , physics , structural basin
Estimating articulatory movements from speech acoustic representations is known as acoustic-to-articulatory inversion (AAI). In this work, a speaker conditioned AAI (SC AAI) is proposed using a bi-directional LSTM neural network, where training is performed by pooling acoustic-articulatory data from multiple speakers along with their corresponding speaker identity information. For this work, 7.24 h of multi-speaker acoustic-articulatory data are collected from 20 speakers speaking 460 English sentences. Experiments with 20 speakers indicate that the SC AAI model performs better than SD AAI model with an improvement of correlation coefficient by 0.036 (absolute) between the original and estimated articulatory movements.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom