IIIT Hyderabad Publications |
|||||||||
|
Towards efficient Neural Machine Translation for Indian LanguagesAuthor: Ruchit Agrawal Date: 2017-10-27 Report no: IIIT/TH/2017/79 Advisor:Dipti Misra Sharma AbstractMachine Translation among Indian languages is a challenging problem, owing to multiple factors like their morphological complexity and diversity, in addition to lack of sufficient parallel data for most language pairs. Recent advances in the past have employed rule-based and statistical techniques to approach the problem of Indian language MT. Neural Machine Translation is an emerging technique depicting impressive performance, better than traditional MT methods in multiple aspects. This thesis demonstrates the application of Neural Machine Translation (NMT) techniques for Indian languages, with an emphasis in two important directions:- 1. Usage of specific linguistic features belonging to Indian languages to improve translation quality. 2. Building a robust NMT model which delivers efficient performance across different domains with a limited parallel corpus. We create NMT systems for 110 Indian language pairs utilizing various morphological and syntactic features to improve translation quality. We observe that although NMT models have a strong efficacy to learn language constructs, the usage of specific features further help in improving the performance. We also propose a three-phase integrated approach which helps in improving robustness across domains as well as translation quality in the absence of large parallel corpora. The three-phase training shows a significant improvement in accuracy as well as coverage over a baseline NMT model. This is the first effort towards developing Neural Machine Translation for Indian languages to the best of our knowledge. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |