CROSS LANGUAGE INFORMATION ACCESS IN TELUGU

Authors: Vasudeva Varma,Aditya Mogadala,V. Srikanth Reddy,Ram Bhupal Reddy
Conference: Siliconandhrconference (Global Internet forum for Telugu)

Date: 2011-09-28
Report no: IIIT/TR/2011/110

Abstract

This paper describes a large scale system for Cross Language Information Access (CLIA) which will help accessing information available in a language that is different from the language in which a query or information need is expressed. We specifically focus on Telugu to English and Hindi CLIA system. For a query given in Telugu, this system retrieves information available in English and Hindi. It also tries to present that information back in Telugu using several information access technologies such as: Information Extraction, Summarization and Machine Translation. This system was developed at IIIT Hyderabad, in the context of a larger project funded by Ministry of Communication and Information Technology executed by a consortium of ten Universities and research organizations covering six Indian languages for input and English and Hindi as target languages. CLIA systems can be considered as extension to Cross Language Information Retrieval (CLIR) systems. They process of the results obtained by the CLIR system in the target language. Users unfamiliar with the language of documents returned using CLIR are often unable to extract relevant information from these documents. This requires further processing which might include producing a summary of the multiple documents retrieved, translating such summary back to the source or the query language, extracting structured information from the retrieved documents and then producing human consumable information nuggets, and translating the entire or the relevant portions of the document. In order to address above‐mentioned needs for building efficient CLIA systems one requires fully automatic high quality translation system at different levels of information processing. This paper details the pre-retrieval and post-retrieval processing tasks where various challenges like query translation, language identification, software engineering issues involved in building end to end systems.

Full paper: pdf

Centre for Search and Information Extraction Lab

IIIT Hyderabad Publications

CROSS LANGUAGE INFORMATION ACCESS IN TELUGU

Abstract