Open Access
Unsupervised optimal phoneme segmentation: theory and experimental evaluation
Author(s) -
Qiao Yu,
Luo Dean,
Minematsu Nobuaki
Publication year - 2013
Publication title -
iet signal processing
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.384
H-Index - 42
ISSN - 1751-9683
DOI - 10.1049/iet-spr.2012.0191
Subject(s) - computer science , segmentation , artificial intelligence , pattern recognition (psychology) , speech recognition
Automatic phoneme segmentation of a speech sequence is a basic problem in speech engineering. This study investigates unsupervised phoneme segmentation without using prior information on linguistic contents and acoustic models of an input sequence. The authors formulate the unsupervised segmentation as an optimal problem by means of maximum likelihood, and show that the optimal segmentation corresponds to minimising the coding length of the input sequence. Under different assumptions, five different objective functions are developed, namely log determinant, rate distortion (RD), Bayesian log determinant, Mahalanobis distance and Euclidean distance objectives. The authors prove that the optimal segmentations have the transformation‐invariant properties, introduce a time‐constrained agglomerative clustering algorithm to find the optimal segmentations, and propose an efficient implementation of the algorithm by using integration functions. The experiments are carried out on the TIMIT database to compare the above five objective functions. The results show that RD achieves the best performance, and the proposed method outperforms the previous unsupervised segmentation methods.