IIIT Hyderabad Publications |
|||||||||
|
Exploring Sentiment Analysis in Low-resource LanguagesAuthor: Monil Gokani 2018114001 Date: 2023-12-29 Report no: IIIT/TH/2023/190 Advisor:Radhika Mamidi AbstractSentiment Analysis is an important task for analysing online content across languages for tasks such as content moderation and opinion mining. However, state-of-the-art NLP modelling techniques often require a large amount of training data to achieve their results. Unfortunately, high-quality annotated data is often a rare commodity for many languages other than English, including most Indian languages. We attempt to tackle this data scarcity in this thesis in two ways - by creating additional resources, and by exploring more data-efficient modelling techniques. Over the past few years, while some significant resources for Sentiment Analysis have been developed in several Indian languages, there do not exist any large-scale, open-access corpora for Gujarati. In this thesis, we present and describe the Gujarati Sentiment Analysis Corpus (GSAC), which has been sourced from Twitter and manually annotated by native speakers of the language. We describe in detail our collection and annotation processes and conduct extensive experiments on our corpus to provide reliable baselines for future work using our dataset. We then explore modelling techniques that work well in a low-resource setting by experimenting with AfriSenti, a collection of sentiment analysis datasets in 12 African languages. We propose an XGBoost-based ensemble model trained on emoticon frequency-based features and the predictions of several statistical models such as SVMs, Logistic Regression, Random Forests, and BERT-based pretrained language models such as AfriBERTa and AfroXLMR. We also report results from additional experiments not in the system and conduct an ablation study to observe the effects of different types of models and features on the final ensemble. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |