CHARACTERIZING AND DETECTING LIVESTREAMIMG CHATBOTS

Author: SHREYA JAIN
Date: 2020-07-09
Report no: IIIT/TH/2020/66
Advisor:Ponnurangam kumaraguru

Abstract

Livestreaming platforms enable content producers or streamers to broadcast creative content to a potentially large viewer base. Chatrooms form an integral part of such platforms, enabling viewers to interact both with streamer and amongst themselves. Streams with high engagement (many viewers and high active chatters) are typically considered engaging and often promoted to end users by means of recommendation algorithms, and exposed to better monetization opportunities via revenue share from platform advertising, viewer donations and third-party sponsorships. Given such incentives, some streamers make use of fraudulent means to increase perceived engagement by simulating chatter via fake “chatbots” which can be purchased from online marketplaces. This inorganic engagement can negatively influence recommendations, hurt streamer and viewer trust in the platform, and harm monetization for honest streamers. In this study, we tackle the novel problem of automating detection of chatbots on livestreaming platforms. To this end, we first formalize the livestreaming chatbot detection problem and characterize differences between botted and genuine chatter behaviour observed from a real-world livestreaming chatter dataset collected from Twitch.tv. We then propose SHERLOCK and BOTHUNT methods, which posits a two-stage approach of detecting chatbotted streams, and subsequently detecting constituent chatbots. Finally, we demonstrate effectiveness on both real and synthetic data: to this end, we propose a novel strategy for collecting labeled, synthetic chatter dataset (typically unavailable) from such platforms, enabling evaluation of proposed detection approaches against chatbot bahaviors with varying signatures. The SHERLOCK approach achieves 97% precision/recall on the real world dataset and +80% F1 score across most simulated attack settings and BOTHUNT achieves 86% accuracy for real world dataset and 93% accuracy across all attack settings. This thesis is a timely contribution to the area of computer science specially combating astroturfing, needed to mitigate the spread of fraudulent bot users on Live streaming Platforms. The results from this thesis can be used to build real world solutions to mitigate the spread of untrustworthy or botted streams, fake users, etc. on live streaming platforms in the future.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

CHARACTERIZING AND DETECTING LIVESTREAMIMG CHATBOTS

Abstract