IIIT Hyderabad Publications |
|||||||||
|
Towards IdentifyingHumorandAuthor’sStance inCode-MixedSocial MediaDataAuthor: Sushmitha Reddy Date: 2020-03-20 Report no: IIIT/TH/2020/13 Advisor:Radhika Mamidi AbstractSocial media sites such as Twitter and Facebook have become people’s main channels for connecting and voicing their opinions with other individuals and the community. With the growth of social media platforms in India and people expressing themselves in multiple languages and the open and casual nature of these sites leads to more people voicing themselves in their native language, resulting in a greater amount of code-mixed content that is currently lacking in the annotated set of data. Through the access to public opinion on almost every subject, we can collect a huge amount of user data that could prove useful to different businesses, making tasks such as opinion mining, sentiment analysis and opinion mining even more essential. Therefore, recently the task of understanding users in low-resource languages has become the most researched task. With the penetration of the Internet and its proliferation into multilingual societies, the linguistic diversity of the real world is now reflected in online communities. Limiting solutions to a set of major languages is no longer viable. Nevertheless, the proliferation is relatively recent, and hence the amount of available data in many widely-spoken languages is inadequate. Manual annotation of data is a hu- mongous task and often expensive. In this thesis, we aim towards working on these widely-spoken languages to understand users better when they express themselves in their native language on social media platforms. Understanding people broadly means understanding their feeling, opinion, stance towards a particular target, and thus it brings in the tasks of stance and humor detection. We are mainly focusing on text classification which refers to the process of determining text categories based on the text content under a given classification system. We present new solutions for stance detection and the other for detection of humor in English-Hindi code-mixed tweets. The tweets for stance detection are collected for the target topic - ‘Demonetisation.’ Deep neural network models have been widely used in the field of natural language processing (NLP). Convolutional Neural Network’s (CNN’s) and Recurrent neural networks (RNN’s), which are able to process sequences of variable length, are known methods for sequence modeling tasks. Long-short-term memory (LSTM) is one category of RNNs and has achieved exceptional text classification efficiency. Nonetheless, due to the high dimensionality and sparse text data and the dynamic semantics of the natu- ral language, text classification poses demanding challenges — a novel and integrated architecture that incorporates a bidirectional LSTM (BiLSTM) to solve the above problems. A novel and unified archi- tecture containing a bidirectional LSTM (BiLSTM), attention mechanism, and a convolutional layer are proposed in this paper in order to solve the above problems. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |