
Multi‐band Approach to Deep Learning‐Based Artificial Stereo Extension
Author(s) -
Jeon Kwang Myung,
Park Su Yeon,
Chun Chan Jun,
Park Nam In,
Kim Hong Kook
Publication year - 2017
Publication title -
etri journal
Language(s) - English
Resource type - Journals
SCImago Journal Rank - 0.295
H-Index - 46
eISSN - 2233-7326
pISSN - 1225-6463
DOI - 10.4218/etrij.17.0116.0773
Subject(s) - extension (predicate logic) , artificial intelligence , deep learning , computer science , computer vision , engineering , programming language
In this paper, an artificial stereo extension method that creates stereophonic sound from a mono sound source is proposed. The proposed method first trains deep neural networks (DNNs) that model the nonlinear relationship between the dominant and residual signals of the stereo channel. In the training stage, the band‐wise log spectral magnitude and unwrapped phase of both the dominant and residual signals are utilized to model the nonlinearities of each sub‐band through deep architecture. From that point, stereo extension is conducted by estimating the residual signal that corresponds to the input mono channel signal with the trained DNN model in a sub‐band domain. The performance of the proposed method was evaluated using a log spectral distortion (LSD) measure and multiple stimuli with a hidden reference and anchor (MUSHRA) test. The results showed that the proposed method provided a lower LSD and higher MUSHRA score than conventional methods that use hidden Markov models and DNN with full‐band processing.