This micro project will study the adaptation of automatic speech recognition (ASR) systems for impaired speech. Specifically, the micro-project will focus on improving ASR systems for speech from subjects with dysarthria and/or stuttering speech impairment types of various degrees. The work will be developed using either German “Lautarchive” data comprising only 130 hours of untranscribed doctor-patient German speech conversations and/or using English TORGO dataset. Applying human-in-the-loop methods we will spot individual errors and regions of low certainty in ASR in order to apply human-originated improvement and clarification in AI decision processes.

Output

Paper for ICASSP 2021 and/or Interspeech 2022

Presentations

Project Partners:

  • Brno U, Mireia Diez
  • Technische Universität Berlin (TUB), Tim Polzehl

Primary Contact: Mireia Diez Sanchez, Brno University of Technology

Main results of micro project:

Project has run for less than 50% of its allocated time.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Contribution to the objectives of HumaneAI-net WPs

WP1 Learning, Reasoning and Planning with Human in the Loop
T1.1 Linking symbolic and subsymbolic learning

WP3 Human AI Interaction and Collaboration
T3.1 Foundations of Human-AI interaction and Collaboration
T3.6 Language-based and multilingual interaction
T3.7 Conversational, Collaborative AI

WP6 Applied research with industrial and societal use cases
T6.3 Software platforms and frameworks
T6.5 Health related research agenda and industrial usecases

Tangible outputs

  • Publication: –
  • Other: Internal report – Mireia Diez Sanchez, mireia@fit.vutbr.cz

Results Description

In this miscroproject, we pursued enabling access to AI technology to those who might have special needs when interacting with AI: Automatic Speech Recognition made accessible for people with dysarthria

Dysarthria is a motor speech disorder resulting from neurological injury and is characterized by poor articulation of phonemes. Within Automatic speech recognition (ASR), dysarthric speech recognition is a tedious task due to the lack of supervised data and diversity.

Particularly, in this work, we have studied the performance of different ASR systems on dysarthric speech: LF-MMI, Transformer and wav2vec2. The analysis revealed the superiority of the wav2vec2 models on the task. We investigated the importance of speaker dependent auxiliary features such as fMLLR and xvectors for adapting wav2vec2 models for improving dysarthric speech recognition. We showed that in contrast to hybrid systems, wav2vec2 did not improve by adapting model parameters based on each speaker.
We proposed a wav2vec2 adapter module that inherits speaker features as auxiliary information to perform effective speaker normalization during finetuning. We showed that, using the adapter module, fMLLR and xvectors are complementary to each other, and proved the effectiveness of the approach outperforming existing SoTA on UASpeech dysartric speech ASR.
In our cross-lingual experiments, we also showed that combining English and German data for training, can further improve performance of our systems, proving useful in scenarios where little training examples exist for a particular language.

AdAIS addressed topics related to the tasks:
WP1 Learning, Reasoning and Planning with Human in the Loop
T1.1 Linking symbolic and subsymbolic learning
WP3 Human AI Interaction and Collaboration
T3.1 Foundations of Human-AI interaction and Collaboration
T3.6 Language-based and multilingual interaction
T3.7 Conversational, Collaborative AI
WP6 Applied research with industrial and societal use cases
T6.3 Software platforms and frameworks
T6.5 Health related research agenda and industrial use cases

Publications

Publication: M. K. Baskar, T. Herzig, D. Nguyen, M. Diez, T. Polzehl, L. Burget, J. Černocký, “Speaker adaptation for Wav2vec2 based dysarthric ASR”. Proc. Interspeech 2022, 3403-3407, doi: 10.21437/Interspeech.2022-10896

Links to Tangible results

Link to publication: https://www.isca-speech.org/archive/pdfs/interspeech_2022/baskar22b_interspeech.pdf

Open source tool for training ASR models for dysarthic speech: https://github.com/creatorscan/Dysarthric-ASR
The repository contains: A baseline recipe to train a TDNN-CNN hybrid model based ASR system, this recipe is prepared to be trained on the TORGO dataset. And an end-to-end model using ESPnet framework prepared to be trained on UASpeech dataset.