2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDDEMO-2.1
Paper Title Groove2Groove: Style Transfer for Music Accompaniments
Authors Ondřej Cífka, Télécom Paris, Institut Polytechnique de Paris, France; Umut Şimşekli, Inria/ENS, France; Gaël Richard, Télécom Paris, Institut Polytechnique de Paris, France
SessionDEMO-2: Show and Tell Demonstrations 2
LocationZoom
Session Time:Friday, 11 June, 08:00 - 09:45
Presentation Time:Friday, 11 June, 08:00 - 09:45
Presentation Poster
Topic Show and Tell Demonstration: Demo
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Groove2Groove (Grv2Grv) is a music accompaniment style transfer system. Given two pieces of music – a content input and a style input – it generates a new accompaniment for the first piece in the style of the second one. One can then combine the generated accompaniment with the melody of the content input to obtain a full cover song in the desired style. As one of our favorite examples, we can use Groove2Groove to render Nirvana's classic Lithium in the style of the funk song Fantastic Voyage by Lakeside. The system, described in our paper "Groove2Groove: One-Shot Music Style Transfer with Supervision from Synthetic Data," https://doi.org/10.1109/TASLP.2020.3019642, is based on sequence-to-sequence (seq2seq) neural networks known from the neural machine translation (NMT) field. The network is trained in a fully supervised way to generate the desired output given the style/content input pair. This supervised learning scheme leverages a synthetic training dataset constructed for this purpose. The demo, available at http://tiny.cc/grv2grv, enables interactive exploration of pre-generated examples, as well as re-running the system with different settings, and even uploading custom inputs. The user has the option to edit the inputs by selecting a specific section or subset of instruments and use the neural network to generate a new accompaniment. This accompaniment can then be remixed with some of the tracks of the content input to create a cover song, which can be directly played or downloaded.