2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

2021 IEEE International Conference on Acoustics, Speech and Signal Processing

6-11 June 2021 • Toronto, Ontario, Canada

Extracting Knowledge from Information

Technical Program

Paper Detail

Paper IDSPE-56.4
Paper Title Multi-task Estimation of Age and Cognitive Decline from Speech
Authors Yilin Pan, University of Sheffield, United Kingdom; Venkata Srikanth Nallanthighal, Radboud University Nijmegen, Netherlands; Daniel Blackburn, Heidi Christensen, University of Sheffield, United Kingdom; Aki Harma, Philips Research, Netherlands
SessionSPE-56: Paralinguistics in Speech
LocationGather.Town
Session Time:Friday, 11 June, 14:00 - 14:45
Presentation Time:Friday, 11 June, 14:00 - 14:45
Presentation Poster
Topic Speech Processing: [SPE-ANLS] Speech Analysis
IEEE Xplore Open Preview  Click here to view in IEEE Xplore
Virtual Presentation  Click here to watch in the Virtual Conference
Abstract Speech is a common physiological signal that can be affected by both ageing and cognitive decline. Often the effect can be confounding, as would be the case for people at, e.g., very early stages of cognitive decline due to dementia. Despite this, the automatic predictions of age and cognitive decline based on cues found in the speech signal are generally treated as two separate tasks. In this paper, multi-task learning is applied for the joint estimation of age and the Mini-Mental Status Evaluation criteria (MMSE) commonly used to assess cognitive decline. To explore the relationship between age and MMSE, two neural network architectures are evaluated: a SincNet-based end-to-end architecture, and a system comprising of a feature extractor followed by a shallow neural network. Both are trained with single-task or multi-task targets. To compare, an SVM-based regressor is trained in a single-task setup. i-vector, x-vector and ComParE features are explored. Results are obtained on systems trained on the DementiaBank dataset and tested on an in-house dataset as well as the ADReSS dataset. The results show that both the age and MMSE estimation is improved by applying multi-task learning, with state-of-the-art results achieved on the ADReSS dataset acoustic-only task.