Towards sentiment augmented predictive techniques in natural language

Author: Battu Varshit
Date: 2019-07-22
Report no: IIIT/TH/2019/86
Advisor:Radhika Mamidi

Abstract

Sentiment is an important feature of any text. Feelings and opinions are purely subjective, unlike facts. Analyzing these opinions accurately is a challenging task. However, using the sentiment, we are looking to understand the attitude of a writer with regard to a specific topic in a piece of text. Sentiment analysis has been an important topic of research since a very long time. Simply put, the task involves a system that would be predicting (classifying) the sentiment of a given input sentence as either positive, neutral or negative. Sometimes we have more fine-grained classes and sometimes we even evict the neutral class based on what we need to use the sentiment for. The capability to automatically compute the sentiment for things such as user reviews, critique on movies, e-commerce etc is of tremendous importance. It helps organizations and businesses take multiple large scale important decisions which are based strongly on user satisfaction. Movies are one of the most prominent means of entertainment. The favourite pastime of many people would be watching movies. They provide a break from the hectic schedule a person goes throughout the day. People often prefer to express their views online in English as compared to other local languages because of many factors such as non availability of applications which makes typing in local languages easy whereas typing in English is very easy. Even if one manages to write a review in a local language there are very less number of platforms which accept such reviews. This leaves us with a very little amount of data in languages apart from English to work on. The widespread use of the Internet in recent times has led to large volumes of data related to movies being generated and shared online. People watch movies, write reviews and give ratings online. This method of broadcasting opinions has gained a lot of popularity ever since. However, this led to a decrease in the quality of opinions that were shared. Due to this, people find it challenging and difficult to browse through all the opinions. This issue of bogus and random opinions is witnessed in a lot of cases where the user can provide feedback in a quick manner such as multiple choice options, checkboxes etc. Movie ratings and genres fall in this category. Movie ratings and genres play an important role in tasks such as user movie recommendations, verifying the relationship between user-submitted reviews and ratings etc. The ability to predict the correct rating/genre of a movie would be useful considering these aspects. In this thesis, we attempt to solve problems we face in real life with the help of sentiment. One problem is encountering invalid information(wrong ratings) by using the sentiment of the movie reviews. We propose methods to predict the movie rating based on its summary. We then set out to use priors that are generally available with movie summaries in order to improve the accuracy. In order to achieve this, we consider the associated movie reviews as well while predicting the rating and provide insights on why this helps our models perform better. We use sentiment of reviews along with the summary in order to predict the rating more accurately since the sentiment captures a lot of essential information that can aid rating prediction. The majority of methods used to study NLP problems employed shallow machine learning models and time-consuming, hand-crafted features for a very long time. Many problems were encountered due to this. One of the problem is the curse of dimensionality as linguistic information was represented with sparse representations. However, with the recent popularity and success of word embeddings, sentence embeddings, neural network based models have achieved better results on various language-related tasks as compared to traditional machine learning models like SVM or logistic regression. Deep learning methods are starting to out-compete the statistical methods on some challenging natural language processing problems with singular and simpler models. We propose deep learning models to solve our problem at hand and compare the results with traditional methods. The other problem is preserving sentiment during translation. An important aspect of translation is ensuring that the complete meaning (including the sentiment/opinion) of the source text is translated appropriately. Consider product reviews, retaining the sentiment is an important aspect of translating these reviews. We cannot afford to have phrases such as “not at all good” get translated to “not good” since there is a significant difference in the sentiment of the two phrases. Most of the metrics that evaluate machine translators only consider n-grams, number of overlaps, etc in order to produce translation scores. This does not account for the preservation of sentiment. In order to solve this other problem, we propose a new metric that considers the sentiment of a sentence along with the existing means of evaluation.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Towards sentiment augmented predictive techniques in natural language

Abstract