IIIT Hyderabad Publications |
|||||||||
|
Tag Me a Label with Multi-arm: Active Learning for Telugu Sentiment AnalysisAuthors: Muku Sandeep,Subba Reddy Oota,Radhika Mamidi Conference: 19th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK-2017 2017) Pages: 355-367 Location Lyon, France Date: 2017-08-28 Report no: IIIT/TR/2017/104 AbstractSentiment Analysis is one of the most active research areas in natural language processing and an extensively studied problem in data mining, web mining and text mining for English language. With the proliferation of social media these days, data is widely increasing in regional languages along with English. Telugu is one such regional language with abundant data available in social media, but it’s hard to find a labeled training set as human annotation is time-consuming and cost-ineffective. To address this issue, in this paper the practicality of active learning for Telugu sentiment analysis is investigated. We built a hybrid approach by combining different query selection strategy frameworks to increase more accurate training data instances with limited labeled data. Using a set of classifiers like SVM, XGBoost, and Gradient Boosted Trees (GBT), we achieved promising results with minimal error rate. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |