ASR IV |
Session Type: Poster |
Time: Friday, December 21, 13:30 - 15:30 |
Location: Kallirhoe Hall |
|
|
TOWARD DOMAIN-INVARIANT SPEECH RECOGNITION VIA LARGE SCALE TRAINING |
Arun Narayanan; Google |
Ananya Misra; Google |
Khe Chai Sim; Google |
Golan Pundak; Google |
Anshuman Tripathi; Google |
Mohamed Elfeky; Google |
Parisa Haghani; Google |
Trevor Strohman; Google |
Michiel Bacchiani; Google |
|
TRANSLITERATION BASED APPROACHES TO IMPROVE CODE-SWITCHED SPEECH RECOGNITION PERFORMANCE |
Jesse Emond; Google |
Bhuvana Ramabhadran; Google |
Brian Roark; Google |
Pedro Moreno; Google |
Min Ma; Google |
|
EXPLORING LAYER TRAJECTORY LSTM WITH DEPTH PROCESSING UNITS AND ATTENTION |
Jinyu Li; Microsoft |
Liang Lu; Microsoft |
Changliang Liu; Microsoft |
Yifan Gong; Microsoft |
|
MULTICHANNEL ASR WITH KNOWLEDGE DISTILLATION AND GENERALIZED CROSS CORRELATION FEATURE |
Wenjie Li; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
Yu Zhang; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
Pengyuan Zhang; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
Fengpei Ge; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
|
OPTIMIZING THE QUALITY OF SYNTHETICALLY GENERATED PSEUDOWORDS FOR THE TASK OF MINIMAL-PAIR DISTINCTION |
Heiko Holz; University of Tübingen |
Maria Chinkina; University of Tübingen |
Laura Vetter; Ludwig Maximilian University of Munich |
|
LEVERAGING SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS FOR ENHANCING ACOUSTIC-TO-WORD SPEECH RECOGNITION |
Masato Mimura; Kyoto University |
Sei Ueno; Kyoto University |
Hirofumi Inaguma; Kyoto University |
Shinsuke Sakai; Kyoto University |
Tatsuya Kawahara; Kyoto University |
|
HIERARCHICAL MULTITASK LEARNING WITH CTC |
Ramon Sanabria; Carnegie Mellon University |
Florian Metze; Carnegie Mellon University |
|
A K-NEAREST NEIGHBOURS APPROACH TO UNSUPERVISED SPOKEN TERM DISCOVERY |
Alexis Thual; ENS |
Corentin Dancette; ENS |
Julien Karadayi; ENS |
Juan Benjumea; ENS |
Emmanuel Dupoux; ENS |
|
A NEW TIMIT BENCHMARK FOR CONTEXT-INDEPENDENT PHONE RECOGNITION USING TURBO FUSION |
Timo Lohrenz; TU Braunschweig |
Wei Li; TU Braunschweig |
Tim Fingscheidt; TU Braunschweig |
|
EFFICIENT IMPLEMENTATION OF RECURRENT NEURAL NETWORK TRANSDUCER IN TENSORFLOW |
Tom Bagby; Google |
Kanishka Rao; Google |
Khe Chai Sim; Google |
|
AUDIO-VISUAL SPEECH RECOGNITION WITH A HYBRID CTC/ATTENTION ARCHITECTURE |
Stavros Petridis; Imperial College London |
Themos Stafylakis; University of Nottingham |
Pingchuan Ma; Imperial College London |
Georgios Tzimiropoulos; University of Nottingham |
Maja Pantic; Imperial College London |
|
MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING |
Jaejin Cho; Johns Hopkins University |
Murali Karthick Baskar; Brno university of technology |
Ruizhi Li; Johns Hopkins University |
Matthew Wiesner; Johns Hopkins University |
Sri Harish Mallidi; Amazon |
Nelson Yalta; Waseda University |
Martin Karafiat; Brno university of technology |
Shinji Watanabe; Johns Hopkins University |
Takaaki Hori; Mitsubishi Electric Research Laboratories |
|
SPEAKER SELECTIVE BEAMFORMER WITH KEYWORD MASK ESTIMATION |
Yusuke Kida; Yahoo Japan Corporation |
Dung Tran; Yahoo Japan Corporation |
Motoi Omachi; Yahoo Japan Corporation |
Toru Taniguchi; Yahoo Japan Corporation |
Yuya Fujita; Yahoo Japan Corporation |
|
SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION |
Tobias Menne; RWTH Aachen University |
Ralf Schlüter; RWTH Aachen University |
Hermann Ney; RWTH Aachen University |
|
SPEAKER ADAPTATION FOR END-TO-END CTC MODELS |
Ke Li; Johns Hopkins University |
Jinyu Li; Microsoft AI and Research |
Yong Zhao; Microsoft AI and Research |
Kshitiz Kumar; Microsoft AI and Research |
Yifan Gong; Microsoft AI and Research |
|
AN EXPLORATION OF MIMIC ARCHITECTURES FOR RESIDUAL NETWORK BASED SPECTRAL MAPPING |
Peter Plantinga; The Ohio State University |
Deblin Bagchi; The Ohio State University |
Eric Fosler-Lussier; The Ohio State University |
|
MULTI-CHANNEL MULTI-SPEAKER OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK |
Zhuo Chen; Microsoft Cloud & AI |
Xiong Xiao; Microsoft Cloud & AI |
Takuya Yoshioka; Microsoft Cloud & AI |
Jinyu Li; Microsoft Cloud & AI |
Hakan Erdogan; Microsoft Cloud & AI |
Yifan Gong; Microsoft Cloud & AI |
|
A STUDY ON SPEECH ENHANCEMENT USING EXPONENT-ONLY FLOATING POINT QUANTIZED NEURAL NETWORK (EOFP-QNN) |
Yi-Te Hsu; Academia Sinica |
Yu-Chen Lin; National Taiwan University |
Szu-Wei Fu; National Taiwan University |
Yu Tsao; Academia Sinica |
Tei-Wei Kuo; National Taiwan University |
|
RAPID SPEAKER ADAPTATION OF NEURAL NETWORK BASED FILTERBANK LAYER FOR AUTOMATIC SPEECH RECOGNITION |
Hiroshi Seki; Toyohashi University of Technology |
Kazumasa Yamamoto; Chubu University |
Tomoyosi Akiba; Toyohashi University of Technology |
Seiichi Nakagawa; Chubu University |
|
FAR-FIELD ASR USING LOW-RANK AND SPARSE SOFT TARGETS FROM PARALLEL DATA |
Pranay Dighe; Idiap Research Institute, EPFL |
Afsaneh Asaei; Idiap Research Institute |
Herve Bourlard; Idiap Research Institute, EPFL |
|
DEEP VIEW2VIEW MAPPING FOR VIEW-INVARIANT LIPREADING |
Alexandros Koumparoulis; National Technical University of Athens |
Gerasimos Potamianos; University of Thessaly |
|