IIIT Hyderabad Publications |
|||||||||
|
TEASER: Towards Efficient Aspect-based SEntiment analysis and RecognitionAuthor: Bajaj Vaibhav Ganesh Date: 2022-04-22 Report no: IIIT/TH/2022/35 Advisor:Radhika Mamidi AbstractRecent advances in networking sites, the Internet revolution has changed everything, right from businesses to healthcare to education to the ways of communicating with our friends. The Internet has opened doors for people to express themselves, write their thoughts about a particular topic, share an experience online for other people to read without much hassle. Even before going out, order something, or watching a show/movie, people tend to check the online reviews first. And it is feasible and practical also because anyone wouldn’t want to spend their time, money, and other resources on something that isn’t worth it. With so many reviews, tweets, and content available online, it is also important to process them in such a way that they can be used by everyone productively. One use-case could be as the word limit for online reviews on sites like IMDb, Zomato, and Amazon is pretty significant (10, 000 characters for IMDb, Amazon has a limit of 5, 000 words ∼ 23, 500 characters) some reviews tend to be longer. While the reviewer elaborates on their experience, from a reader point of view, what really important is, what aspects of the given target entities the reviewer liked/disliked. This is where the need for Aspect-based Sentiment Analysis arises. Aspect-Based Sentiment Analysis (ABSA) aims to extract the aspects of the given target entities and their respective sentiments. The main issue is, the amount of data is so huge that manually processing the data on such a vast scale is impossible. E.g., Twitter alone sees an average of 6000 tweets per second, roughly 500 million tweets per day. Hence, we look for some fast, automated methods that can do the processing almost in real-time. In this thesis, we propose a deep-learning enabled model, TEASER, based on an extract-then-classify framework for extracting the aspects and detecting the respective sentiment attached. We also conduct extensive experiments on the 3 existing datasets (Restaurant14, Restaurant15, Laptop14) to show that TEASER performs better than the existing models. In chapter 4, we present two novel datasets in the domain of movie reviews, Movie20, and moviesLarge. Movie20 is a supervised dataset of 1162 sentences manually annotated by two human annotators, whereas moviesLarge is a pseudo-labeled dataset of 14373 sentences. With the help of Semi-supervised learning, we benchmark TEASER on the Movie20 dataset. We also evaluate the results of TEASER on the Movie20 dataset thoroughly and try to reason the gaps in predicted output and gold annotation. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |