IIIT Hyderabad Publications |
|||||||||
|
Towards Adapting Curriculum Learning for Sentiment Analysis: Challenges and AnalysesAuthor: Anvesh Rao Vijjini Date: 2021-04-30 Report no: IIIT/TH/2021/37 Advisor:Radhika Mamidi AbstractCurriculum Learning (CL) strategies have been proposed for text classification tasks. These strategies propose “difficulty” score for each training sample. Curriculum Learning hypothesize that within this ordering, training on the easier samples first followed by the difficult, leads to an increment in performance. The idea is derivative of cognitive science’s theory of how human and animal brains learn, and that learning a difficult task can be made easier by phrasing it as a sequence of easy to difficult tasks. Application of curriculum learning for sentiment analysis is a rather new topic and hence requires through experiments and analysis. We experiment the effectiveness of popular language model BERT coupled with curriculum learning. Previous works have approached Curriculum Learning especially for text classification tasks such as sentiment analysis in two major pacing methods - Baby Steps and One Pass. Pacing methods decide how the model observes the curriculum divided data irrespective of the architecture and the curriculum strategy. Baby Steps proposes increasing the training data for the classification model in a cumulative fashion. This implies that model is trained on an easy set and then harder data is added to the previous set. This makes the training to be time consuming. As opposed to Baby Steps, One Pass proposes making distinct, mutually exclusive sets of the training data and trains on each one of these sets one after the other. However, in practise, One Pass while being faster, always was shown under-performing Baby Steps or even no curriculum. We analyze the reasons behind the consistent failure of One Pass. Our experiments show that One Pass suffers from Catastrophic forgetting every time its trained on the next set leading to model forgetting the previous set. We further derive an effective curriculum strategy based on SentiWordNet for sentiment analysis. Unlike sentence length strategy which proposes training network on samples ordered by sentence length, proposed strategy trains the network based on a difficulty ordering according to the task of sentiment analysis. Our results across multiple deep learning architectures demonstrate that this is an important prerequisite for an effective curriculum strategy. We then analyze curriculum learning’s relation to difficulty of a task by experiments with BERT. Previous works have suggested that curriculum learning shows higher improvement for harder tasks and low improvement for easier tasks. We call this Task Difficulty Hypothesis. We experiment on multiple synthetic datasets to prove this hypothesis more strongly. Finally, we visualize curriculum learning through a novel attention movement visualization methodology. These visualizations build a qualitative study to aid in understanding how curriculum learning works. We see that complicated tasks are broken down into easier sub-tasks which are addressed in a step by step manner by curriculum learning. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |