Paper ID | AUD-7.5 | ||
Paper Title | SEPNET: A DEEP SEPARATION MATRIX PREDICTION NETWORK FOR MULTICHANNEL AUDIO SOURCE SEPARATION | ||
Authors | Shota Inoue, University of Tsukuba, Japan; Hirokazu Kameoka, NTT Communication Science Laboratories, Japan; Li Li, Shoji Makino, University of Tsukuba, Japan | ||
Session | AUD-7: Audio and Speech Source Separation 3: Deep Learning | ||
Location | Gather.Town | ||
Session Time: | Wednesday, 09 June, 13:00 - 13:45 | ||
Presentation Time: | Wednesday, 09 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | In this paper, we propose SepNet, a deep neural network (DNN) designed to predict separation matrices from multichannel observations. One well-known approach to blind source separation (BSS) involves independent component analysis (ICA). The independent low-rank matrix analysis (ILRMA) is one of its powerful variants. These methods allow the estimation of separation matrices based on deterministic iterative algorithms. The existence of a deterministic iterative algorithm that can find one of the stationary points of the BSS problem implies that a DNN can also play that role if designed and trained properly. Motivated by this, we propose introducing a DNN that learns to convert a predefined input into a true separation matrix in accordance with a multichannel observation. To enable it to find one of the multiple solutions corresponding to different permutations of the source indices, we further propose adopting a permutation-invariant training strategy to train the network. By using a fully convolutional architecture, we can design the network so that the forward propagation can be computed efficiently. The experimental results revealed that SepNet was able to find separation matrices faster and with better separation accuracy than ILRMA for mixtures of two sources. |