IIIT Hyderabad Publications |
|||||||||
|
Automatic Classification of Conversational Humor with a focus on COVID-19 tweetsAuthor: Gayatri Purigilla Date: 2023-07-03 Report no: IIIT/TH/2023/120 Advisor:Radhika Mamidi AbstractWe, humans, are social beings, and our communication is the most evolved and wellstructured form of a communication system that we are aware of. An essential aspect of human communication that helps humans bond faster and develop a sense of closeness is the use of humor. Humor that occurs as part of a conversation is known as conversational humor. Conversational humor is a type of humor that is unique and contrary to what it may seem like. It is more than just plugging canned jokes into a conversation. It requires the use of certain techniques and the presence of at least two interlocutors who understand the context of the conversation. The first step towards understanding conversational humor is to identify the different types into which it can be categorized and the techniques that are used to generate each type of humor. Current studies on this front either consider only a subset of these types and techniques or are domain specific. To tackle these challenges, we first propose a hierarchical annotation schema which allows us to get a comprehensive overview of conversational humor. For this task, we use a famous Telugu play, Kanyasulkam, and consider humorous utterances from this play as the dataset. This schema includes tags for type, technique, and benignity and considers cultural nuances in the text, making it an extensive schema for conversational humor. Further, to test the universality of the schema, we built a dataset of a different domain (Covid-19-based humor) and language (English). This dataset was annotated using a part of the annotation schema containing the type and technique tags. Two more tags viz. “Situation” and “Relevance” were added in the schema to help make the dataset more valuable as a standalone dataset which can be used by researchers from other fields like marketing, sociology, etc. The effectiveness of this dataset is tested with the help of various experiments for binary as well as multi-label multi-class classification using state-of-the-art ML models including but not limited to BERT, RoBERTa, BerTweet, etc. Based on the accuracy and analysis from the experiments, we can show that the annotation schema is universal in terms of language and domain. Such a classification of data can be used to accelerate the annotation process for humor data, and this annotated data can be used for various purposes like marketing, connecting with a target audience based on the relevance tag, and aiding research in the field of conversational humor for building humorous chatbots, and more human-like interactive systems. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |