Paper ID | MLSP-11.4 |
Paper Title |
SIMILARITY ANALYSIS OF SELF-SUPERVISED SPEECH REPRESENTATIONS |
Authors |
Yu-An Chung, Massachusetts Institute of Technology, United States; Yonatan Belinkov, Technion Henry and Marilyn Taub Faculty of Computer Science, Israel; James Glass, Massachusetts Institute of Technology, United States |
Session | MLSP-11: Self-supervised Learning for Speech Processing |
Location | Gather.Town |
Session Time: | Tuesday, 08 June, 16:30 - 17:15 |
Presentation Time: | Tuesday, 08 June, 16:30 - 17:15 |
Presentation |
Poster
|
Topic |
Machine Learning for Signal Processing: [MLR-SSUP] Self-supervised and semi-supervised learning |
IEEE Xplore Open Preview |
Click here to view in IEEE Xplore |
Virtual Presentation |
Click here to watch in the Virtual Conference |
Abstract |
Self-supervised speech representation learning has recently been a prosperous research topic. Many algorithms have been proposed for learning useful representations from large-scale unlabeled data, and their applications to a wide range of speech tasks have also been investigated. However, there has been little research focusing on understanding the properties of existing approaches. In this work, we aim to provide a comparative study of some of the most representative self-supervised algorithms. Specifically, we quantify the similarities between different self-supervised representations using existing similarity measures. We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations. In addition to showing how various self-supervised models behave differently given the same input, our study also finds that the training objective has a higher impact on representation similarity than architectural choices such as building blocks (RNN/Transformer/CNN) and directionality (uni/bidirectional). Our results also suggest that there exists a strong correlation between pre-training loss and downstream performance for some self-supervised algorithms. |