Technical Program

Paper Detail

Presentation #7
Session:Voice Conversion and TTS
Location:Kallirhoe Hall
Session Time:Friday, December 21, 10:00 - 12:00
Presentation Time:Friday, December 21, 10:00 - 12:00
Presentation: Poster
Topic: Speech recognition and synthesis:
Paper Title: COMPARING PROSODIC FRAMEWORKS: INVESTIGATING THE ACOUSTIC-SYMBOLIC RELATIONSHIP IN TOBI AND RAP
Authors: Raul Fernandez, Andrew Rosenberg, IBM Research, United States
Abstract: ToBI is the dominant tool for symbolically describing prosodic content in American English speech material. This is due to its descriptive power and its theoretical grounding, but also to the amount of available annotated data. Recently, a modest amount of material annotated with the Rhythm and Pitch (RaP) framework was released publicly. In this paper, we investigate the acoustic-symbolic relationship under these two systems. We present experiments looking at this relationship in both directions. From acoustic to symbolic, we compare the automatic prediction of prosodic prominence as defined under the two systems. From symbolic to acoustic, we examine the utility of these annotation standards to correctly prescribe the acoustics of a given utterance from their symbolic sequences. We find RaP to be promising, showing a somewhat stronger acoustic-symbolic relationship than ToBI given a comparable amount of data for some aspects of these tasks. While with more annotated data ToBI results are stronger, it remains to be shown whether RaP performance can scale up.