Ranking Multilingual Documents using Minimal Language Dependent Resources

Authors: GSK Santosh,kiran kumar,Vasudeva Varma
Conference: 12th International Conference on Intelligent Text Processing and Computational Linguistics
Location Tokyo, Japan
Date: 2011-02-20
Report no: IIIT/TR/2011/2

Abstract

This paper proposes an approach of extracting simple and effective features that enhances multilingual document ranking (MLDR). There is limited prior research on capturing the concept of multilingual document similarity in determining the ranking of documents. However, the literature available has worked heavily with language specic tools, making them hard to reimplement for other languages. Our approach extracts various multilingual and monolingual similarity features using a basic language resource (bilingual dictionary). No language-specfic tools are used, hence making this approach extensible for other languages. We used the datasets provided by Forum for Information Retrieval Evaluation (FIRE) 1 for their 2010 Adhoc Cross-Lingual document retrieval task on Indian languages. Experiments have been performed with dif- ferent ranking algorithms and their results are compared. The results obtained showcase the effectiveness of the features considered in enhanc- ing multilingual document ranking.

Full paper: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

Ranking Multilingual Documents using Minimal Language Dependent Resources

Abstract