IIIT Hyderabad Publications |
|||||||||
|
ET: EVENTS FROM TWEETSAuthor: Ruchi Parikh Date: 2017-07-29 Report no: IIIT/TH/2017/52 Advisor:Kamalakar Karlapalem AbstractSocial media sites such as Twitter and Facebook have emerged as popular tools for people to express their opinions and sentiments on various topics. Large amount of data provided by these media is extremely valuable for mining trending topics and events. However, this massive volume also dictates that the mining approach be efficient in terms of computations and storage. In this thesis, we propose an efficient, scalable system to detect events from tweets. The system does not employ any Twitter-specific features, and thus, can be readily adapted to any other social media site. Since tweets are very short and diverse in nature, traditional approaches, meant for typical long/well-formatted documents, can not be applied to tweets. Tweets are written informally with a lot of abbreviations and mistakes. Since tweets are very short, statistical concepts like tf.idf can not be directly used on them. Our approach detects events by exploring their textual and temporal components. The system does not require any target entity to be specified; it automatically detects generic events from a set of tweets. The key components of our system are an extraction scheme for event representative keywords, an efficient storage mechanism to store their appearance patterns, and a hierarchical clustering technique based on the common co-occurring features of keywords. We evaluate our system on two data-sets, one provided by VAST challenge 2011 and other published by US based users in January 2013. Our approach is efficient in terms of computational time and memory, and achieves high precision across two different datasets. The detected events are easy to interpret, and encompass a wide range of topics. Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |