Sentiment Lexicon Creation in Tamil Using Hybrid Techniques

Author: Abishek Kannan
Date: 2019-03-06
Report no: IIIT/TH/2019/24
Advisor:Radhika Mamidi

Abstract

Sentiment analysis is a discipline of Natural Language Processing which deals with identifying and analyzing the subjectivity of data. It is an important task with both commercial and academic function- ality. It has acquired wide commercial uses including social media monitoring tasks, survey responses, review systems, etc. In review systems and surveys, sentiment analysis allows us to understand the opinions of multiple recipients. It expresses opinions such as like/dislike, for/against of general population.The merits of sentiment analysis do not stop there. The applications of sentiment analysis are broad and powerful. The ability to extract meaningful insights from social data is being widely adopted by organizations across the world. The human language is complex, teaching a machine to analyze the various grammatical nuances,cultural variations, slang and misspellings that occur in online mentions is a difficult process. To perform various sentiment analysis tasks we created a platform to employ different techniques and statistical methods to evaluate the data. By doing text analysis using computational linguistic techniques, we are able to capture and quantify affective states and subjective information. Sentiment analysis/opinion mining is done on human generated data. Since sentiment knowledge in humans grow with age and daily interactions, cognitive knowledge is required to understand emotion. In order for machines to understand underlying emotion, we need to transfer this cognitive knowledge in some form. Majority of the computational approaches to opinion mining comes using pre-existing lexicons. It has been understood that such lexicons are vital for any sentiment analysis system. These lexicons are the closest representation of our cognitive knowledge. These computational lexicon wrap sentiment information to assist in the task of opinion mining. Most research efforts in sentiment lexicon creation are bound to the English language. SentiWordNet for English is one such important lexical resource that contains subjective polarity for each lexical item. It was devised to aid in the task of opinion mining and sentiment classification. With growing data in native vernacular, there is a need for language-specific sentiment lexicon. For resource poor languages, creating such a sentiment lexicon is a difficult task to achieve. Described in this thesis are two different methods on how to build such a resource for an Indian language - Tamil, using existing resources in other languages. Such a resource would serve as a baseline for further improvements in the task of sentiment analysis specific to Tamil data. Furthermore, the architecture followed in the two methods can be easily adapted to other languages.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Sentiment Lexicon Creation in Tamil Using Hybrid Techniques

Abstract