IIIT Hyderabad Publications |
|||||||||
|
A Dataset of Hindi-English Code-Mixed Social Media Text for Hate Speech DetectionAuthors: Aditya Bohra,Deepanshu Vijay,Vinay Singh,Syed S. Akhtar,Manish Shrivastava Conference: 16th Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-2018 2018) Location New Orleans, USA Date: 2018-06-01 Report no: IIIT/TR/2018/79 AbstractHate speech detection in social media texts is an important Natural language Processing task, which has several crucial applications like sentiment analysis, investigating cyber bullying and examining socio-political controversies. While relevant research has been done independently on code-mixed social media texts and hate speech detection, our work is the first attempt in detecting hate speech in Hindi-English code-mixed social media text. In this paper, we analyze the problem of hate speech detection in code-mixed texts and present a Hindi-English code-mixed dataset consisting of tweets posted online on Twitter. The tweets are annotated with the language at word level and the class they belong to (Hate Speech or Normal Speech). We also propose a supervised classification system for detecting hate speech in the text using various character level, word level, and lexicon based features. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |