IIIT Hyderabad Publications |
|||||||||
|
Conversational Humor Analysis: Developing Data, Annotation Schema and ModelsAuthor: Vaishnavi Pamulapati Date: 2021-12-29 Report no: IIIT/TH/2021/131 Advisor:Radhika Mamidi AbstractConversational humor (CH) is a sub-domain of humor where the participants (speakers or listeners) engage in different types of humor such as retorts or teasing for various purposes. There are shared complexities between this phenomenon and the larger domain of humor, but several features are unique to CH. To overcome these complexities, we focus on two aspects of comprehending CH. First, we formulate a schema for a methodological approach to annotating CH using a famous Telugu stage play, Kanyasulkam, as a medium of analysis. We then collect data on humorous and non-humorous conversations and perform experiments using the state-of-the-art NLP models to detect the occurrence of CH. Contemporary work that focuses on shedding light on the purposes of CH by interlocutors considers only a few techniques or types. These analyses show the speakers’ intention, but it does not construct CH’s structure. In our work, we devise a schema that includes a hierarchical approach with different levels such as monologue/dialogue, benign/non-benign, and identifies the types and techniques involved in CH as well. In the pursuit of teaching a machine a language, we must bear in mind that for a model to perform reasonably well in tasks of any domain, we must have ample data to feed the model. However, for low resource languages such as Telugu, this becomes a difficult task. For this reason, annotators are employed to help with providing metadata to a dataset of a specific domain so that the model can learn the patterns belonging to it and produce satisfactory results. To this end, a work of literature is fully annotated by A1 and A2 in our study. We then calculate the agreement between their annotations to give an objective measure of the validity of the hierarchical schema. Many researchers in NLP have provided profound insights with the aim of detecting humor in various forms of text (tweets, movie scripts, jokes). Nevertheless, few studies have focused on text classification of conversational humor due to the complexity of the domain. We attempt to facilitate research in this direction by using several popular models to classify humorous and non-humorous conversations automatically. Recurrent Neural Networks (RNNs) such as Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM) learn sentence embeddings and are then used to classify text. In contrast, we use Text Graphical Convolutional Networks (GCN) that simultaneously learns the class a conversation belongs to and word embeddings based on word co-occurrence and document-word relations. In order to make the most out of the well-acclaimed pre-trained models, we fine-tune FastText word embeddings and different BERT (Bidirectional Encoder Representations from Transformers) models to generate sentence embeddings. We further use these models to classify text and compare their performance based on standard evaluation metrics. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |