ASR IV |
| Session Type: Poster |
| Time: Friday, December 21, 13:30 - 15:30 |
| Location: Kallirhoe Hall |
| TOWARD DOMAIN-INVARIANT SPEECH RECOGNITION VIA LARGE SCALE TRAINING |
| Arun Narayanan; Google |
| Ananya Misra; Google |
| Khe Chai Sim; Google |
| Golan Pundak; Google |
| Anshuman Tripathi; Google |
| Mohamed Elfeky; Google |
| Parisa Haghani; Google |
| Trevor Strohman; Google |
| Michiel Bacchiani; Google |
| TRANSLITERATION BASED APPROACHES TO IMPROVE CODE-SWITCHED SPEECH RECOGNITION PERFORMANCE |
| Jesse Emond; Google |
| Bhuvana Ramabhadran; Google |
| Brian Roark; Google |
| Pedro Moreno; Google |
| Min Ma; Google |
| EXPLORING LAYER TRAJECTORY LSTM WITH DEPTH PROCESSING UNITS AND ATTENTION |
| Jinyu Li; Microsoft |
| Liang Lu; Microsoft |
| Changliang Liu; Microsoft |
| Yifan Gong; Microsoft |
| MULTICHANNEL ASR WITH KNOWLEDGE DISTILLATION AND GENERALIZED CROSS CORRELATION FEATURE |
| Wenjie Li; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
| Yu Zhang; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
| Pengyuan Zhang; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
| Fengpei Ge; Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics |
| OPTIMIZING THE QUALITY OF SYNTHETICALLY GENERATED PSEUDOWORDS FOR THE TASK OF MINIMAL-PAIR DISTINCTION |
| Heiko Holz; University of Tübingen |
| Maria Chinkina; University of Tübingen |
| Laura Vetter; Ludwig Maximilian University of Munich |
| LEVERAGING SEQUENCE-TO-SEQUENCE SPEECH SYNTHESIS FOR ENHANCING ACOUSTIC-TO-WORD SPEECH RECOGNITION |
| Masato Mimura; Kyoto University |
| Sei Ueno; Kyoto University |
| Hirofumi Inaguma; Kyoto University |
| Shinsuke Sakai; Kyoto University |
| Tatsuya Kawahara; Kyoto University |
| HIERARCHICAL MULTITASK LEARNING WITH CTC |
| Ramon Sanabria; Carnegie Mellon University |
| Florian Metze; Carnegie Mellon University |
| A K-NEAREST NEIGHBOURS APPROACH TO UNSUPERVISED SPOKEN TERM DISCOVERY |
| Alexis Thual; ENS |
| Corentin Dancette; ENS |
| Julien Karadayi; ENS |
| Juan Benjumea; ENS |
| Emmanuel Dupoux; ENS |
| A NEW TIMIT BENCHMARK FOR CONTEXT-INDEPENDENT PHONE RECOGNITION USING TURBO FUSION |
| Timo Lohrenz; TU Braunschweig |
| Wei Li; TU Braunschweig |
| Tim Fingscheidt; TU Braunschweig |
| EFFICIENT IMPLEMENTATION OF RECURRENT NEURAL NETWORK TRANSDUCER IN TENSORFLOW |
| Tom Bagby; Google |
| Kanishka Rao; Google |
| Khe Chai Sim; Google |
| AUDIO-VISUAL SPEECH RECOGNITION WITH A HYBRID CTC/ATTENTION ARCHITECTURE |
| Stavros Petridis; Imperial College London |
| Themos Stafylakis; University of Nottingham |
| Pingchuan Ma; Imperial College London |
| Georgios Tzimiropoulos; University of Nottingham |
| Maja Pantic; Imperial College London |
| MULTILINGUAL SEQUENCE-TO-SEQUENCE SPEECH RECOGNITION: ARCHITECTURE, TRANSFER LEARNING, AND LANGUAGE MODELING |
| Jaejin Cho; Johns Hopkins University |
| Murali Karthick Baskar; Brno university of technology |
| Ruizhi Li; Johns Hopkins University |
| Matthew Wiesner; Johns Hopkins University |
| Sri Harish Mallidi; Amazon |
| Nelson Yalta; Waseda University |
| Martin Karafiat; Brno university of technology |
| Shinji Watanabe; Johns Hopkins University |
| Takaaki Hori; Mitsubishi Electric Research Laboratories |
| SPEAKER SELECTIVE BEAMFORMER WITH KEYWORD MASK ESTIMATION |
| Yusuke Kida; Yahoo Japan Corporation |
| Dung Tran; Yahoo Japan Corporation |
| Motoi Omachi; Yahoo Japan Corporation |
| Toru Taniguchi; Yahoo Japan Corporation |
| Yuya Fujita; Yahoo Japan Corporation |
| SPEAKER ADAPTED BEAMFORMING FOR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION |
| Tobias Menne; RWTH Aachen University |
| Ralf Schlüter; RWTH Aachen University |
| Hermann Ney; RWTH Aachen University |
| SPEAKER ADAPTATION FOR END-TO-END CTC MODELS |
| Ke Li; Johns Hopkins University |
| Jinyu Li; Microsoft AI and Research |
| Yong Zhao; Microsoft AI and Research |
| Kshitiz Kumar; Microsoft AI and Research |
| Yifan Gong; Microsoft AI and Research |
| AN EXPLORATION OF MIMIC ARCHITECTURES FOR RESIDUAL NETWORK BASED SPECTRAL MAPPING |
| Peter Plantinga; The Ohio State University |
| Deblin Bagchi; The Ohio State University |
| Eric Fosler-Lussier; The Ohio State University |
| MULTI-CHANNEL MULTI-SPEAKER OVERLAPPED SPEECH RECOGNITION WITH LOCATION GUIDED SPEECH EXTRACTION NETWORK |
| Zhuo Chen; Microsoft Cloud & AI |
| Xiong Xiao; Microsoft Cloud & AI |
| Takuya Yoshioka; Microsoft Cloud & AI |
| Jinyu Li; Microsoft Cloud & AI |
| Hakan Erdogan; Microsoft Cloud & AI |
| Yifan Gong; Microsoft Cloud & AI |
| A STUDY ON SPEECH ENHANCEMENT USING EXPONENT-ONLY FLOATING POINT QUANTIZED NEURAL NETWORK (EOFP-QNN) |
| Yi-Te Hsu; Academia Sinica |
| Yu-Chen Lin; National Taiwan University |
| Szu-Wei Fu; National Taiwan University |
| Yu Tsao; Academia Sinica |
| Tei-Wei Kuo; National Taiwan University |
| RAPID SPEAKER ADAPTATION OF NEURAL NETWORK BASED FILTERBANK LAYER FOR AUTOMATIC SPEECH RECOGNITION |
| Hiroshi Seki; Toyohashi University of Technology |
| Kazumasa Yamamoto; Chubu University |
| Tomoyosi Akiba; Toyohashi University of Technology |
| Seiichi Nakagawa; Chubu University |
| FAR-FIELD ASR USING LOW-RANK AND SPARSE SOFT TARGETS FROM PARALLEL DATA |
| Pranay Dighe; Idiap Research Institute, EPFL |
| Afsaneh Asaei; Idiap Research Institute |
| Herve Bourlard; Idiap Research Institute, EPFL |
| DEEP VIEW2VIEW MAPPING FOR VIEW-INVARIANT LIPREADING |
| Alexandros Koumparoulis; National Technical University of Athens |
| Gerasimos Potamianos; University of Thessaly |