IIIT Hyderabad Publications |
|||||||||
|
Interpreting the Syntactic and Social Elements of the Tweet Representations via Elementary Property Prediction TasksAuthors: ganesh.j ,Manish Gupta,Vasudeva Varma Conference: The Thirtieth Annual Conference on Neural Information Processing Systems (NIPS) Location Centre Convencions Internacional Barcelona, Barcelona SPAIN Date: 2016-12-05 Report no: IIIT/TR/2016/49 AbstractResearch in social media analysis is recently seeing a surge in the number of research works applying representation learning models to solve high-level syntactico-semantic tasks such as sentiment analysis [1], semantic textual similarity computation [2], hashtag prediction [3] and so on. Though the performance of the representation learning models are better than the traditional models for all the tasks, little is known about the core properties of a tweet encoded within the representations. In a recent work, Hill et al. [4] perform a comparison of different sentence representation models by evaluating them for different high-level semantic tasks such as paraphrase identification, sentiment classification, question answering, document retrieval and so on. This type of coarse-grained analysis is opaque as it does not clearly reveal the kind of information encoded by the representations. Our work presented here constitutes the first step in opening the black-box of vector embeddings for social media posts, particularly tweets. Essentially we ask the following question: “What are the core properties encoded in the given tweet representation?”. We explicitly group the set of these properties into two categories: syntactic and social. Syntactic category includes properties such as tweet length, the order of words in it, the words themselves, slang words, hashtags and named entities in the tweet. On the other hand, social properties consist of ‘is reply’, and ‘reply time’. We investigate the degree to which the tweet representations encode these properties. We assume that if we cannot train a classifier to predict a property based on its tweet representation, then this property is not encoded in this representation. For example, the model which preserves the tweet length should perform well in predicting the length given the representation generated from the model. Though these elementary property prediction tasks are not directly related to any downstream application, knowing that the model is good at modeling a particular property (e.g., the social properties) indicates that it could excel in correlated applications (e.g., user profiling task). In this work we perform an extensive evaluation of 9 unsupervised and 4 supervised tweet representation models, using 8 different properties. The most relevant work is that of Adi et al. [5], which investigates three sentence properties in comparing unsupervised sentence representation models such as average of words vectors and LSTM auto-encoders. We differ from their work in two ways: (1)While they focus on sentences, we focus on social media posts which opens up the challenge of considering multiple salient properties such as hashtags, named entities, conversations and so on. (2) While they work with only unsupervised representation-learning models, we investigate the traditional unsupervised methods (BOW, LDA), unsupervised representation learning methods (Siamese CBOW, Tweet2Vec), as well as supervised methods (CNN, BLSTM). Our main contributions are summarized below. • Our work is the first towards interpreting the tweet embeddings in a fine-grained fashion. To this end, we propose a set of tweet-specific elementary property prediction tasks which help in unearthing the basic characteristics of different tweet representations. • To the best of our knowledge, this work is the first to do a holistic study of traditional, unsupervised and supervised representation learning models for tweets. • We compare various tweet representations with respect to such properties across various dimensions like tweet length and word ordering sensitivity. The paper is organized as follows. Sections 2 and 3 discuss the set of proposed elementary property prediction tasks and the models considered for this study respectively. Section 4 and 5 presents the experiment setup and result analysis respectively. We conclude the work with a brief summary in Section 5. Full paper: pdf Centre for Search and Information Extraction Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |