IIIT Hyderabad Publications |
|||||||||
|
A Corpus of English-Hindi Code-Mixed Tweets for Sarcasm DetectionAuthors: Sahil Swami,Ankush Khandelwal,Vinay Singh,Syed S. Akhtar,Manish Shrivastava Conference: 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing-2018 2018) Location Hanoi, Vietnam Date: 2018-03-18 Report no: IIIT/TR/2018/118 AbstractSocial media platforms like twitter and facebook have become two of the largest mediums used by people to express their views towards different topics. Generation of such large user data has made NLP tasks like sentiment analysis and opinion mining much more important. Using sarcasm in texts on social media has become a popular trend lately. Using sarcasm reverses the meaning and polarity of what is implied by the text which poses challenge for many NLP tasks. The task of sarcasm detection in text is gaining more and more importance for both commercial and security services. We present the first English-Hindi code-mixed dataset of tweets marked for presence of sarcasm and irony where each token is also annotated with a language tag. We present a baseline supervised classification system developed using the same dataset which achieves an average F-score of 78.4 after using random forest classifier and performing 10-fold cross validation. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |