IIIT Hyderabad Publications |
|||||||||
|
Seed Selection for Domain-Specific SearchAuthors: Nikhil Priyatam,Ajay Dubey,Krish Perumal,Sai Praneeth,Kakadia Dharmesh,Vasudeva Varma Conference: The 23rd International World Wide Web Conference Location COEX, Samsung-dong, Gangnam-gu, Seoul 135-731, Korea Date: 2014-04-07 Report no: IIIT/TR/2014/17 AbstractThe last two decades have witnessed an exponential rise in web content from a plethora of domains, which has necessitated the use of domain-specic search engines. Diversity of crawled content is one of the crucial aspects of a domainspecic search engine. To a large extent, diversity is governed by the initial set of seed URLs. Most of the existing approaches rely on manual eort for seed selection. In this work we automate this process using URLs posted on Twitter. We propose an algorithm to get a set of diverse seed URLs from a Twitter URL graph. We compare the performance of our approach against the baseline zero similarity seed selection method and nd that our approach beats the baseline by a signicant margin. Full paper: pdf Centre for Search and Information Extraction Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |