Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments
Author(s) -
Nils Holzenberger,
Mingxing Du,
Julien Karadayi,
Rachid Riad,
Emmanuel Dupoux
Publication year - 2018
Publication title -
interspeech 2022
Language(s) - English
Resource type - Conference proceedings
DOI - 10.21437/interspeech.2018-2364
Subject(s) - upsampling , computer science , variable (mathematics) , feature (linguistics) , word (group theory) , speech recognition , variety (cybernetics) , artificial intelligence , abx test , natural language processing , pattern recognition (psychology) , mathematics , statistics , philosophy , mathematical analysis , linguistics , geometry , image (mathematics)
Fixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can outperform the variable-length input feature representation on both evaluations. Recurrent autoencoders, trained without supervision, can yield even better results at the expense of increased computational complexity.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom