An English-Hindi Code-Mixed Corpus: Stance Annotation and Baseline System

Authors: Sahil Swami,Ankush Khandelwal,Vinay Singh,Syed S. Akhtar,Manish Shrivastava
Conference: 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing-2018 2018)
Location Hanoi, Vietnam
Date: 2018-03-18
Report no: IIIT/TR/2018/119

Abstract

Social media has become one of the main channels for people to communicate and share their views with the society. We can often detect from these views whether the person is in favor, against or neutral towards a given topic. These opinions from social media are very useful for various companies. We present a new dataset that consists of 3545 English-Hindi code-mixed tweets with opinion towards Demonetisation that was implemented in India in 2016 which was followed by a large countrywide debate. We present a baseline supervised classification system for stance detection developed using the same dataset that uses various machine learning techniques to achieve an accuracy of 58.7% on 10-fold cross validation.

Full paper: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

An English-Hindi Code-Mixed Corpus: Stance Annotation and Baseline System

Abstract