IEEE ICASSP 2022 || Singapore || 7-13 May 2022 Virtual; 22-27 May 2022 In-Person

IEP-13: How Amazon Astro Estimates the Directions of Sound Sources

Thu, 12 May, 20:00 - 20:45 China Time (UTC +8)
Thu, 12 May, 12:00 - 12:45 UTC

Location: Gather Area P

Virtual

Gather.Town

Expert

Presented by: Wai C. Chu, Amazon Lab126

Astro is Amazon’s household robot released in September 2021, it is designed for home security monitoring, remote elder care, video calls, photo/selfie taking, music playback, and all tasks performed by a regular Amazon Echo device; it is connected to the cloud and have access to all its resources. Its two wheels enable mobility which allow it to follow a user with entertainment or to deliver calls, messages, timers, alarms, or reminders. Other capabilities include floor plan mapping, face recognition, high-fidelity audio reproduction, acoustic event recognition, and finding the directions of sound sources.

Sophisticated hardware components are incorporated to Astro to support advanced functionalities, these include multiple cameras and sensors for obstacle avoidance, navigation, and stereo depth. An eight-mic array is incorporated for audio capture. Acoustic events in the vicinity are monitored, with their directions estimated using sound source localization (SSL) algorithms. By knowing the directions of acoustic events, Astro can formulate the most suitable response depending on the circumstances; for instance, when a user utters the wake-word “Astro”, the robot rotates to face the user, based on the estimated direction of the wake-word.

Direction finding for sound sources is challenging for Astro because of the conditions found in typical indoor environment: Reflection, reverberation, acoustic interference, multiple sources, etc.; other factors that further complicate development are obstruction and motion. In this talk we describe the design choices for SSL in Astro, we start with the microphone array, followed by SSL algorithms, we then delve into reflection rejection, noise and interference suppression, performance measurement criteria, and test framework; we further describe the role of the SSL module within the larger system known as audio front end (AFE [1]), and interactions with other modules such as wake-word detector and floor plan mapping.

[1] Chu et al., Multichannel Audio Front End for Far-field Speech Recognition, EUSIPCO 2018.

Biography

Wai C. Chu is a Principal Scientist in the Audio Technology Team at Amazon Lab126. He joined Amazon in 2010 and has been involved with algorithm designs and software implementation for several projects requiring audio and speech processing. He is the system and software co-architect for audio front end (AFE) processing in Echo (Doppler), and has designed algorithms for acoustic echo cancellation, dereverberation, noise reduction, packet loss concealment, residual echo suppression, etc. He has served as technical leader for Alexa voice communication since December 2015. Besides developing the on-device software for speech processing and deploying them successfully to millions of Echo devices, he also designed cloud-based speech enhancement solutions for messaging. Currently his main focus are sound source localization and speech quality assessment. His primary programming language is C++, with exposure to Java and Python. Prior to joining Amazon he held engineering positions in corporations such as Texas Instruments and NTT DoCoMo, and start-ups such as Intervideo and Shotspotter. He received a PhD in Electrical Engineering from the Pennsylvania State University. He has an extensive publication record with 1600 citations in Google Scholar, he is a regular reviewer for various conferences and journals, and is the author of the text book "Speech Coding Algorithms" (Wiley 2003) with 30 US patents issued.

LinkedIn: https://www.linkedin.com/in/wai-chu-b539942/

Google Scholar: https://scholar.google.com/citations?user=itLWaaYAAAAJ&hl=en&oi=ao

Research Gate: https://www.researchgate.net/profile/Wai-Chu-7

Amazon Author: https://www.amazon.com/Wai-C.-Chu/e/B001HMKLK4%3Fref=dbs_a_mng_rwt_scns_share

Publons: https://publons.com/researcher/1509783/wai-c-chu/

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

IEEE ICASSP 2022

2022 IEEE International Conference on Acoustics, Speech and Signal Processing

7-13 May 2022 Virtual (all paper presentations) 22-27 May 2022 Main Venue: Marina Bay Sands Expo & Convention Center, Singapore 27-28 October 2022 Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China

7-13 May 2022

Virtual (all paper presentations)

22-27 May 2022

Main Venue: Marina Bay Sands Expo & Convention Center, Singapore

27-28 October 2022

Satellite Venue: Crowne Plaza Shenzhen Longgang City Centre, Shenzhen, China