IIIT Hyderabad Publications |
|||||||||
|
Understanding non-native Speech using SLU systemsAuthor: SNEHAL RANJAN 2020121003 Date: 2024-06-28 Report no: IIIT/TH/2024/125 Advisor:Chiranjeevi Yarra AbstractSpoken language understanding (SLU) systems are a critical component of modern dialog systems, enabling natural language interactions between humans and machines. However, their performance often degrades when dealing with non-native accents and grammatical errors, posing a significant barrier to widespread adoption and accessibility. This thesis presents a comprehensive investigation into the underlying causes of this performance limitation and proposes novel strategies to enhance the robustness and generalization capabilities of SLU models. We construct SLU pipelines incorporating state-of-the-art automatic speech recognition (ASR) and natural language understanding (NLU) models, and benchmark their performance on standard datasets (ATIS, SNIPS) as well as synthetically generated data with controlled variations in accent and grammatical errors. Our empirical evaluation reveals significant performance degradation when models encounter non-native speech, highlighting vulnerabilities in capturing long-range dependencies and salient regions. To corroborate these findings, we employ attention-based architectures and conduct targeted experiments to isolate the impact of accented speech and ungrammatical utterances. The motivation for utilizing attention-based models stems from their ability to provide interpretable insights into the model’s decision-making process. By visualizing the attention weights, we can gain a better understanding of the regions in the input sequence that the model focuses on when making predictions. This interpretability is particularly valuable for identifying potential biases or vulnerabilities in how the model processes accented or ungrammatical speech. Leveraging these insights from the attention mechanism, we propose a novel data augmentation strategy that systematically introduces accent and grammatical variations during training, thereby improving the models’ ability to handle such challenges. The attention visualizations guide the data augmentation process, allowing us to strategically perturb the input sequences in a way that mitigates the model’s weaknesses and enhances its robustness. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |