IIIT Hyderabad Publications |
|||||||||
|
Processing English Verb Phrase Ellipsis for Conversational English-Hindi Machine TranslationAuthor: Aniruddha Prashant Deshpande 20161058 Date: 2023-11-24 Report no: IIIT/TH/2023/173 Advisor:Dipti Misra Sharma AbstractIn this thesis, we try to tackle the problem of erroneous English-Hindi machine translation (MT) outputs due to the presence of the Verb Phrase Ellipsis (VPE) in English. The phenomenon of VPE is prominent in spoken English, and the antecedent to the ellipsis can come from previous sentences in a conversation as well. MT systems translate sentences as a whole and ignore the contextual information from the previous sentences. For these two reasons, spoken English-Hindi translations suffer. We approached this problem by manually annotating 1200 two-person conversations that contain VPE and by studying how their resolution affects the translation qualities. Based on this analysis, we designed a rule-based system for the detection and resolution of VPE in English with the goal of improving their subsequent Hindi translation qualities. Our rule-based system is capable of the following: 1) Detection of VPE, 2) Resolution of Elided Head verb, 3) Resolution of Elided Head verb’s children, 4) Resolution of non-verbal predicates of a copula or a ’be’ main verb, 5) Modifying original sentence in the conversation with the resolved verb phrase. In order to assess the scalability of our rule-based system, we also tested the system’s performance on VPE datasets outside of our annotated data. The system’s performance was also tested on nonconversational data in this research. The scalability test allowed us to utilize our rule-based system for the purpose of data augmentation as well. The augmented data was utilized for the training of a Transformer-Based Seq2Seq generation model. This model helped us investigate whether a sequenceto-sequence model can be designed to handle instances of conversational English VPE, which could be utilized as a pre-processing step before carrying out English-Hindi Machine Translation. Furthermore, a comparative study between the rule-based and the Transformer-based model was carried out as a part of the research supporting this thesis. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |