Timespent Based Models for Predicting User Retention

Authors: Kushal Dave,Vishal Vaingankar,Sumanth Kolar,,Vasudeva Varma
Conference: 22nd International World Wide Web conference
Location Windsor Barra Hotel, Rio de Janeiro, Brazil.
Date: 2013-05-13
Report no: IIIT/TR/2013/22

Abstract

Content discovery is fast becoming the preferred tool for user engagement on the web. Discovery allows users to get educate and entertained about their topics of interest. StumbleUpon is the largest personalized content discovery engine on the Web, delivering more than 1 billion personalized recommendations per month. As a recommendation system one of the primary metrics we track is whether the user returns (retention) to use the product after their initial experience (session) with StumbleUpon. In this paper, we attempt to address the problem of predicting user retention based on the user’s previous sessions. The paper first explores the different user and content features that are helpful in predicting user retention. This involved mapping the user and the user’s recommendations (stumbles) in a descriptive feature space such as the timespent by user, number of stumbles, and content features of the recommendations. To model the diversity in user behaviour, we also generated normalized features that account for the user’s speed of stumbling. Using these features, we built a decision tree classifier to predict retention. We find that a model that uses both the user and content features achieves higher prediction accuracy than a model that uses the two features separately. Further, we used information theoretical analysis to find a subset of recommendations that are most indicative of user retention. A classifier trained on this subset of recommendations achieves the highest prediction accuracy. This indicates that not every recommendation seen by the user is predictive of whether the user will be retained; instead, a subset of most informative recommendations is more useful in predicting retention.

Full paper: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

Timespent Based Models for Predicting User Retention

Abstract