The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio
Author(s) -
Shan Liang,
Wenju Liu,
Wei Jiang,
Wei Xue
Publication year - 2013
Publication title -
the journal of the acoustical society of america
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.619
H-Index - 187
eISSN - 1520-8524
pISSN - 0001-4966
DOI - 10.1121/1.4824632
Subject(s) - orthogonality , computer science , signal to noise ratio (imaging) , disjoint sets , monaural , correctness , separation (statistics) , signal to interference ratio , speech recognition , noise (video) , interference (communication) , algorithm , acoustics , mathematics , artificial intelligence , telecommunications , physics , power (physics) , channel (broadcasting) , geometry , combinatorics , quantum mechanics , machine learning , image (mathematics)
In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about 10log(10)2 dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.
Accelerating Research
Robert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom
Address
John Eccles HouseRobert Robinson Avenue,
Oxford Science Park, Oxford
OX4 4GP, United Kingdom