Single‐Channel Speech Separation Based on Non‐negative Matrix Factorization and Factorial Conditional Random Field | Zendy

LI Xu | Zendy; TU Ming | Zendy; WANG Xiaofei | Zendy; WU Chao | Zendy; FU Qiang | Zendy; YAN Yonghong | Zendy

AI Assistant Blog Pricing

Home ZAIA Blog

Open Access

Single‐Channel Speech Separation Based on Non‐negative Matrix Factorization and Factorial Conditional Random Field

Author(s) -

LI Xu,

TU Ming,

WANG Xiaofei,

WU Chao,

FU Qiang,

YAN Yonghong

Publication year - 2018

Publication title -

chinese journal of electronics

Language(s) - English

Resource type - Journals

SCImago Journal Rank - 0.267

H-Index - 25

eISSN - 2075-5597

pISSN - 1022-4653

DOI - 10.1049/cje.2018.06.016

Subject(s) - non negative matrix factorization , conditional random field , matrix decomposition , speech recognition , computer science , channel (broadcasting) , factorial , field (mathematics) , matrix (chemical analysis) , hidden markov model , markov random field , factorization , mathematics , source separation , algorithm , pattern recognition (psychology) , artificial intelligence , telecommunications , mathematical analysis , eigenvalues and eigenvectors , physics , materials science , composite material , quantum mechanics , segmentation , image segmentation , pure mathematics

A new Non‐negative matrix factorization (NMF) based algorithm is proposed for single‐channel speech separation with a prior known speakers, which aims to better model the spectral structure and temporal continuity of speech signal. First, NMF and k ‐means clus‐ tering are employed to obtain multiple small dictionaries as well as a state sequence that describes the temporal dynamics between these dictionaries for each speaker. Then, a Factorial conditional random field (FCRF) model is trained using the state sequences and dictionaries to jointly model the temporal continuity of two speakers' mixed signal for separation. Experiments show that the proposed algorithm outperforms the baselines with respect to all metrics, for example sparse NMF (+1.12dB SDR, +2.37dB SIR, +0.40dB SAR, +0.2 MOS), nonnegative factorial hidden Markov model (+2.04dB SDR, +4.26dB SIR, +0.62dB SAR, +1.0 MOS) and standard NMF (+2.8dB SDR, +5.08dB SIR, +1.06dB SAR, +1.2 MOS).

The content you want is available to Zendy users.

Already have an account? Click here to sign in.

Having issues? You can contact us here

Accelerating Research