Paper ID | AUD-2.3 | ||
Paper Title | COMPLEX RATIO MASKING FOR SINGING VOICE SEPARATION | ||
Authors | Yixuan Zhang, Yuzhou Liu, DeLiang Wang, The Ohio State University, United States | ||
Session | AUD-2: Audio and Speech Source Separation 2: Music and Singing Voice Separation | ||
Location | Gather.Town | ||
Session Time: | Tuesday, 08 June, 13:00 - 13:45 | ||
Presentation Time: | Tuesday, 08 June, 13:00 - 13:45 | ||
Presentation | Poster | ||
Topic | Audio and Acoustic Signal Processing: [AUD-SEP] Audio and Speech Source Separation | ||
IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
Abstract | Music source separation is important for applications such as karaoke and remixing. Much of previous research focuses on estimating short-time Fourier transform (STFT) magnitude and discarding phase information. We observe that, for singing voice separation, phase can make considerable improvement in separation quality. This paper proposes a complex ratio masking method for voice and accompaniment separation. The proposed method employs DenseUNet with self attention to estimate the real and imaginary components of STFT for each sound source. A simple ensemble technique is introduced to further improve separation performance. Evaluation results demonstrate that the proposed method outperforms recent state-of-the-art models for both separated voice and accompaniment. |