IIIT Hyderabad Publications |
|||||||||
|
Advanced Techniques in Hindi Automatic Post-Editing: Neural Models and Data AugmentationAuthor: Pranav Nair 201525149 Date: 2024-06-29 Report no: IIIT/TH/2024/127 Advisor:Dipti Misra Sharma AbstractAutomatic post-editing (APE) is a crucial technique for enhancing the quality of machine translations. In this thesis, we present an APE approach specifically for English-Hindi translation. Our method employs a sequence-to-sequence neural machine translation model to generate initial translations, followed by a combination of neural and data-augmentation post-editing techniques to refine these translations. We evaluate our approach using a large-scale dataset of English-Hindi translations and demonstrate significant improvements in the quality of the initial translations, as measured by standard automatic evaluation metrics such as BLEU, CHRF, COMET, and TER. Our analysis further reveals that our approach effectively corrects specific errors commonly made by machine translation systems in the English-Hindi language pair, such as incorrect word order and grammatical agreement. We experiment with both neural models, data augmentation, as well as a mix of both to derive an ensemble model that works best for this problem statement. These results highlight the effectiveness of different APE approaches and their potential to substantially improve the quality of machine translation outputs for this language pair. Different evaluation metrics in the study show us various nuances of the functioning of the MT and APE models. These nuances are not only observed but are also analyzed and explained, providing deeper insights into the strengths and limitations of our approach(es) Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |