IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022
  • Virtual (all paper presentations)
22-27 May 2022
  • Main Venue: Marina Bay Sands Expo & Convention Center, Singapore
27-28 October 2022
  • Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

ICASSP 2022
IEP-14: Closing the Gap Between Probabilities and Decisions in Temporal Detection and Classification
Thu, 12 May, 21:00 - 21:45 China Time (UTC +8)
Thu, 12 May, 13:00 - 13:45 UTC
Location: Gather Area P
Virtual
Gather.Town
Expert
Presented by: Dr Çağdaş Bilen, Audio Analytic Ltd

Sound recognition is a prominent field of machine learning that has penetrated our everyday lives. It is already in active use in millions of smart homes, and on millions of smartphones and smart speakers.

Temporal event detection applications such as sound event detection (SED) or keyword spotting (KWS), often aim for low-latency and low-power operation in a broad range of constrained devices without compromising the quality of performance.

In these temporal event detection problems, models are often designed and optimized to estimate instantaneous probabilities and rely on ‘ad-hoc decision post-processing’ to determine the occurrence of events. Within the constraints of commercial deployment, the impact of post-processing on the final performance of the product can be significant, sometimes reducing errors by orders of magnitude. However, these constraints are often ignored in academic challenges and publications, therefore the focus is mainly on improving the performance of the ML models. Furthermore, both in academia and industry, the design and optimization of the ML models often disregards the effect of such decision post-processing, hence potentially leading to suboptimal performance and products.

In this talk, I aim to demonstrate the importance of decision post-processing in temporal ML problems and help to bring it to the attention of the broad research community so that more optimized solutions can be realized.

Biography

Dr Çağdaş Bilen gained his PhD from NYU Tandon School of Engineering and went on to do postdoctoral work at Strasbourg University, Technicolor and INRIA. He has also worked in other research labs such as AT&T (Bell) Labs and HP Labs before joining Audio Analytic in 2018.

He has a keen interest in a greater sense of hearing.

Dr Bilen has authored articles in highly respected international journals and conferences and holds numerous patents on the topics of audio and multimedia signal representation, estimation and modelling. These include topics such as audio inverse problems (audio inpainting, source separation and audio compression) using nonnegative matrix factorization and on fast image search algorithms with sparsity and deep learning.

“My role at Audio Analytic allows me the opportunity to apply my passion for signal processing and machine learning and to explore how a greater sense of hearing can re-shape the way that humans and machines interact.”

Çağdaş leads Audio Analytic’s respected research team in developing core technologies and tools that can further advance the field of machine listening. This cutting-edge work has led to a number of significant technical breakthroughs and patents, such as loss function frameworks, post-biasing technology, a powerful temporal decision engine, and an approach to model evaluation called Polyphonic Sound Detection Score (PSDS) which has been adopted as an industry-standard metric by the DCASE Challenge.