Computational Analysis of Humour

Author: Vikram Ahuja Ahuja
Date: 2019-07-26
Report no: IIIT/TH/2019/91
Advisor:Radhika Mamidi

Abstract

In this thesis we mainly focus on three major aspects of computational humour recognition. We start with categorizing humour based on the classical theories of humour along with features like theme, emotions and topics. We then look at the problem of recognizing humour in conversations and broadcasted speeches which are more complex and large than short jokes. Finally, we try to differentiate between different types of off-color humour and try to detect insulting remarks from off-colour humour in which dark humour is often misclassified as insulting humour. Most scholarly works in the field of computational detection of humour derive their inspiration from the incongruity theory. Incongruity is an indispensable facet in drawing a line between humorous and non-humorous occurrences but is immensely inadequate in shedding light on what actually made the particular occurrence a funny one. Classical theories like Script based Semantic Theory of Humour(SSTH) and General Verbal Theory of Humour(GVTH) try and achieve this feat to an adequate extent. We adhere to a more holistic approach towards classification of humour based on these classical theories with a few improvements and revisions. Through experiments based on our linear approach and performed on large data-sets of short jokes, we are able to demonstrate the adaptability and show componentizability of our model, and that a host of classification techniques can be used to overcome the challenging problem of distinguishing between various categories and sub-categories of jokes. Almost all the studies done in the field ofcomputational humour recognition has been done on datasets consisting of short jokes, tweets and puns. We try to detect humour in conversations and broadcasted speeches as they are complex and contains more contextual information when compared to short jokes. For the purpose of automatic humour detection in monologues we built a corpus containing humorous utterances of TED talks and for dialogues we analysed data from a popular TV-sitcom Friends whose canned laughter gives an indication of when the audience would react. We classified dialogues/monologues into humorous and non-humorous by using multiple deep learning methods. Our experiments on the data show that such deep learning methods outperform the baseline by 21 accuracy points respectively on the TED Talk dataset. Off colour humour is a category of humour which is considered by many to be in poor taste or overly vulgar. Most commonly, off-colour humour contains remarks on particular ethnic group or gender, violence, domestic abuse, acts concerned with sex, excessive swearing or profanity. Blue humour, black humour and insult humour are types of off-colour humour. Blue and black humour unlike insult humour are not outrightly insulting in nature but are often misclassified because of the presence of insults and harmful speech. We then provide an original data-set consisting of nearly 15,000 instances and a novel approach towards resolving the problem of separating black and blue humour from offensive humour which is essential so that free speech on the internet is not curtailed. Our experiments show that deep learning methods outperforms other n-grams based approaches like SVMs, Naive Bayes and Logistic Regression by a large margin.

Full thesis: pdf

Centre for Exact Humanities

IIIT Hyderabad Publications

Computational Analysis of Humour

Abstract