Neural Machine Translation for Low Resource Languages

Author: Akshay Goindani 20171108
Date: 2022-06-24
Report no: IIIT/TH/2022/86
Advisor:Manish Shrivastava

Abstract

Machine Translation is the task where a machine generates a sentence in language T, given an input sentence in language S. The generated and input sentences must have similar semantics. In the current literature, there are various methods proposed to solve the task. Deep neural network based translation models (Neural Machine Translation (NMT)) have been shown to achieve state-of-the-art performance for multiple language pairs. However, neural methods struggle to perform well for lowresource languages. Low-resource languages are understudied, suffer from the data scarcity problem, and lack efficient language processing tools. Since neural methods require large amount of data for training effective models, less-complex statistical methods outperform these methods. Transformer-based NMT models have achieved state-of-the-art performance for various languages. Multiple parallel attention mechanisms that use multiple attention heads facilitate greater performance of the Transformer model for various applications e.g., NMT), text classification. However, transformer models do not perform well in low-resource conditions. In this thesis, we propose a novel Dynamic Head Importance Computation Mechanism (DHICM), which enhances the performance of the transformer model, especially, in low-resource conditions. In multi-head attention mechanism, different heads attend to different parts of the input. However, the limitation is that multiple heads might attend to the same part of the input, resulting in multiple heads being redundant. Thus, the model resources are underutilized. One approach to avoid this is to prune least important heads based on certain importance score. We focus on designing a head importance computation method to dynamically calculate the importance of a head with respect to the input. Our insight is to design an additional attention layer together with multi-head attention, and utilize the outputs of the multi-head attention along with the input, to compute the importance for each head. Additionally, we ad d an extra loss function to prevent the model from assigning same score to all heads, to identify more important heads and improvise performance. We analy zed performance of DHICM for NMT with different languages. Experiments on different datasets show that DHICM outperforms traditional Transformer-based approach by large margin, especially, when less training data is available. Code mixing is a phenomena of mixing two or more languages, prevalent in multilingual communities. In recent years, due to the prevalence of social media platforms, there has been a surge in the usage of code-mixed texts. Code-mixed texts are informal in nature, and do not necessarily follow pre-defined syntactic structures. Due to the informal nature, traditional NLP systems for monolingual languages, do not perform well for code-mixed inputs. Moreover, good quality code-mixed data is scarce, hence,

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Neural Machine Translation for Low Resource Languages

Abstract