IIIT Hyderabad Publications |
|||||||||
|
Transfer Learning for Low Resource Language Processing in Indian ContextAuthor: Anirudh Dahiya Date: 2021-08-07 Report no: IIIT/TH/2021/100 Advisor:Dipti Misra Sharma AbstractAs ever-increasing populations from around the world gain access to digital technologies, enabling them access to information, and thus knowledge, rights, and justice, it is imperative to ensure inclusiveness and accommodate the multitudes of languages used by them. As the computing machines and methods have got more sophisticated and gained capabilities by leaps and bounds, the use of these methods to automatically process human language has gained traction. However, this usually comes at a cost of expensive resource creation and computation to build and run these systems, and has limited the inclusiveness of the recently proposed state of the art systems for language processing to resource scarce languages and domains. Transfer Learning methods aim to dispel this limitation by leveraging pre-trained systems on existing large resources to adapt to serve the low resource domains, tasks, and languages. Research into these methods has gained increased attention as they have proven their efficacy at mitigating the resource constraint across tasks, domains and languages. This study focuses on exploring these transfer learning approaches in the context of presently used Indian languages, particularly Hindi and its code-mixed English-Hindi form, which is widely popular on social media. We explore both across task and cross-lingual transfer approaches towards a variety of downstream tasks, and successfully show their efficacy in the context of resource constrained training data and compute resources. We explore a syntacticosemantic curriculum learning based formulation to explore English-Hindi codemixed sentiment analysis, and show significant gains in performance. We also explore a variety of lexical and sentence level cross lingual transfer approaches for discourse analysis, and demonstrate their efficacy under different training regimen for discourse relation task. We also compare these approaches and gain insights into the nature of the cross-lingual transfer. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |