Presentation # | 6 |
Session: | Voice Conversion and TTS |
Session Time: | Friday, December 21, 10:00 - 12:00 |
Presentation Time: | Friday, December 21, 10:00 - 12:00 |
Presentation: |
Poster
|
Topic: |
Speech recognition and synthesis: |
Paper Title: |
IMPROVING FFTNET VOCODER WITH NOISE SHAPING AND SUBBAND APPROACHES |
Authors: |
Takuma Okamoto; National Institute of Information and Communications Technology | | |
| Tomoki Toda; Nagoya University | | |
| Yoshinori Shiga; National Institute of Information and Communications Technology | | |
| Hisashi Kawai; National Institute of Information and Communications Technology | | |
Abstract: |
Compared with WaveNet vocoder, FFTNet vocoder can synthesize speech waveforms in real time but the synthesized speech quality is not so high. To improve the synthesized speech quality of FFTNet neural vocoder while keeping the network model size for real-time synthesis, this paper provides the following four approaches. 1) The residual connections are introduced into FFTNet for improving the prediction accuracy. 2) Noise shaping and 3) subband approaches which can significantly improve the synthesized speech quality in WaveNet vocoder are directly applied to FFTNet vocoder. 4) Subband FFTNet vocoder with multiband input is additionally proposed for directly compensating the phase shift between subbands. The proposed approaches are evaluated by both objective and subjective experiments using a Japanese male corpus with a sampling frequency of 16 kHz compared with STRAIGHT without mel-cepstral compression, vanilla FFTNet and WaveNet vocoders. The results indicate that the proposed approaches can successfully improve the synthesized speech quality of FFTNet vocoder. Especially, the proposal with noise shaping significantly outperforms the STRAIGHT. |