| Paper ID | MLSP-24.2 | ||
| Paper Title | Efficient Adversarial Audio Synthesis via Progressive Upsampling | ||
| Authors | Youngwoo Cho, Korea Advanced Institute of Science and Technology (KAIST), South Korea; Minwook Chang, NCSOFT, South Korea; Sanghyeon Lee, Korea Advanced Institute of Science and Technology (KAIST), South Korea; Hyoungwoo Lee, Gerard Jounghyun Kim, Korea University, South Korea; Jaegul Choo, Korea Advanced Institute of Science and Technology (KAIST), South Korea | ||
| Session | MLSP-24: Applications in Audio and Speech Processing | ||
| Location | Gather.Town | ||
| Session Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
| Presentation Time: | Wednesday, 09 June, 16:30 - 17:15 | ||
| Presentation | Poster | ||
| Topic | Machine Learning for Signal Processing: [MLR-APPL] Applications of machine learning | ||
| IEEE Xplore Open Preview | Click here to view in IEEE Xplore | ||
| Abstract | This paper proposes a novel generative model called \toolname, which progressively synthesizes high-quality audio in raw-waveform. Progressive upsampling GAN (PUGAN) leverages the previous idea of the progressive generation of higher-resolution output by stacking multiple encoder-decoder architectures. Compared to the existing state-of-the-art model called WaveGAN, which uses a single decoder architecture, our model generates audio signals and converts them to a higher resolution in a progressive manner, while using a significantly smaller number of parameters, e.g., 3.17x smaller for 16 kHz output, than the WaveGAN. Our experiments show that the audio signals can be generated in real-time with comparable quality to that of WaveGAN with respect to the inception scores and human perception. | ||