IIIT Hyderabad Publications |
|||||||||
|
ExploitingWikipedia Categorization for Predicting Age and Gender of Blog AuthorsAuthors: Santosh Kosgi,Aditya Joshi,Manish Gupta,Vasudeva Varma Conference: The 22nd Conference on User Moudelling, Adaptation and personalization Location Aalbarg, Denmark Date: 2014-07-07 Report no: IIIT/TR/2014/61 AbstractFor privacy reasons, personally identifiable information like age and gender of people is not available publicly. However accurate prediction of such information has important applications in the fields of advertising, forensics and business intelligence. Existing methods for this problem have focused on classifier learning using content based features like word n-grams and style based features like Part of Speech (POS) n-grams. Two major drawbacks of previous approaches are: (1) they do not consider the semantic relation between words, and (2) they do not handle polysemy. We propose a novel method to these drawbacks by representing the document usingWikipedia concepts and category information. Experimental results show that classifiers learned using such features along with previously used features help us achieve significantly better accuracy compared to the state-of-the-art methods. Indeed, feature selection shows that our novel features are more effective than previously used content based features. Full paper: pdf Centre for Search and Information Extraction Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |