IIIT Hyderabad Publications |
|||||||||
|
Breaking Language Barriers: A Study On Advancing Aspect-Based Sentiment Analysis for Low Resource LanguagesAuthor: Arghya Bhattacharya Date: 2023-07-14 Report no: IIIT/TH/2023/126 Advisor:Manish Shrivastava AbstractIn recent years, due to the advent of technology and the Internet, the amount of opinionated data targeting products has increased exponentially. With this increase, there has emerged a need to understand the opinions and their associated sentiment to enhance the feedback loop for organizations manufacturing the products and users looking for opinions about the usability of the same to base their future purchasing decisions on. The major bottleneck in achieving the above is the constraint on resources available for certain languages. We explore ways to work through these constraints and attempt to achieve good results for the tasks of Sentiment Analysis and it’s variants. In this thesis, We first attempt to extract language invariant features for downstream tasks like sentiment analysis to be able to retain decent performance in a low resources setting. We find that the ability to do so is task-dependent and hypothesize patterns in tasks where the approach can and cannot work effectively. We then take a closer look at the reasons for the poor performance of models in Aspect Term Extraction and Aspect Term Polarity Classification for Hindi, which is a variant of Sentiment Analysis based on Opinion Mining. After a detailed analysis of the same, we conclude that there is a gap in the state of the existing gold standard dataset for Hindi. We then go ahead and describe our methodology for developing a high-quality dataset parallel to the Gold English dataset for these tasks and establish that the new dataset adequately represents the task. To further improve the state of Aspect Term Extraction (ATE) and Aspect Term Polarity Classification (ATPC), we develop a novel architecture that achieves new state-of-the-art results for Hindi and near state-of-the-art results for English. We also show the fullness of our method in solving the task in a multilingual setting and achieving near-state-of-art results and hence establishing that for tasks where we cannot extract language invariant features, we can develop models that can learn features crucial for the task in a manner which can be leveraged to give high performance reliably. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |