SPE-39.5
INCREMENTAL TEXT-TO-SPEECH SYNTHESIS USING PSEUDO LOOKAHEAD WITH LARGE PRETRAINED LANGUAGE MODEL
Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari, The University of Tokyo, Japan
Session:
Speech Synthesis: Front-end
Track:
Speech and Language Processing
Location:
Gather Area D
Presentation Time:
Tue, 10 May, 21:00 - 21:45 China Time (UTC +8)
Tue, 10 May, 13:00 - 13:45 UTC
Tue, 10 May, 13:00 - 13:45 UTC
Session Chair:
Daniele Giacobello, Apple Inc.
Presentation
Discussion
Resources
No resources available.
Session SPE-39
SPE-39.1: PART-OF-SPEECH MODELS COMPRESSION METHODS FOR ON-DEVICE GRAPHEME-TO-PHONEME CONVERSION
Marek Kubis, Paweł Skórzewski, Adam Mickiewicz University in Poznań, Poland; Maxime Méloux, Marcin Lewandowski, Samsung R&D Institute Poland, Poland; Gunu Jho, Hyoungmin Park, Samsung Electronics, Korea, Republic of
SPE-39.2: AN END-TO-END CHINESE TEXT NORMALIZATION MODEL BASED ON RULE-GUIDED FLAT-LATTICE TRANSFORMER
Wenlin Dai, Changhe Song, Xiang Li, Zhiyong Wu, Tsinghua University, China; Huashan Pan, Xiulin Li, Databaker Technology Co., Ltd, China; Helen Meng, The Chinese University of Hong Kong, China
SPE-39.3: CHINESE SPELLING TEXT GENERATION OF MATHEMATICAL FORMULAS
Su Dong, Sicen Liu, Buzhou Tang, Harbin Institute of Technology, China; Shan Liu, Tencent Technology Company, China
SPE-39.4: POLYPHONE DISAMBIGUATION AND ACCENT PREDICTION USING PRE-TRAINED LANGUAGE MODELS IN JAPANESE TTS FRONT-END
Rem Hida, Masaki Hamada, Emiru Tsunoo, Toshiyuki Sekiya, Sony Group Corporation, Japan; Chie Kamada, Toshiyuki Kumakura, Sony Corporation of America, United States of America
SPE-39.5: INCREMENTAL TEXT-TO-SPEECH SYNTHESIS USING PSEUDO LOOKAHEAD WITH LARGE PRETRAINED LANGUAGE MODEL
Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari, The University of Tokyo, Japan
SPE-39.6: DATA AUGMENTATION FOR LONG-TAILED AND IMBALANCED POLYPHONE DISAMBIGUATION IN MANDARIN
Yang Zhang, Haitong Zhang, Yue Lin, NetEase Games AI Lab, China