IIIT Hyderabad Publications |
|||||||||
|
ABC Algorithm for URL Extraction (LNCS)Authors: LALIT Mohan Mohan,Sourav Sarangi,Y.Raghu Babu Reddy Conference: Practi-O-web( ICWE 2017) Date: 2017-06-05 Report no: IIIT/TR/2017/69 AbstractSeed URLs, Content Classification, Indexing and Rankingare key factors for search results relevance. Domain specific search engines (DSSE) provide more relevant search results as they have lesser ambiguity issues. For wide usage of DSSEs, identification of seed URLs and related child URLs is required. Identification of seed URLs has been manual and takes longer duration for building/decisioning on URL availability for DSSE. We propose nature inspired Artificial Bee Colony algorithm for identification and scoring of seed and child URLs. We implemented the algorithm on ’Security’ domain and extracted 34,007 seedURLs from Wikipedia data dump and 323,488 child URLs using the seed URLs. Based on the volume and the relevance of the extracted URLs, a decision for building a DSSE can be made easily. Full paper: pdf Centre for Software Engineering Research Lab |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |