IIIT Hyderabad Publications |
|||||||||
|
More Accurate Fuzzy Text Search for Languages Using Abugida ScriptsAuthors: Anil Kumar Singh,Harshit Surana,Karthik Gali Conference: ACM SIGIR Workshop on Improving Web Retrieval for non-English queries. SIGIR. Amsterdam. Netherlands. 2007 Date: 2007-10-02 Report no: IIIT/TR/2007/69 AbstractText search is a key step in any kind of information access. For doing it effectively, we can use knowledge about the concerned writing systems. Methods based on such knowledge can give significantly better results for searching text, at least for some languages. This can improve information retrieval in particular and information access in general. In this paper, we present a method for fuzzy text search for languages which use Abugida scripts, e.g. Hindi, Bengali, Telugu, Amharic, Thai etc. We use characteristics of a writing system for fuzzy search and are able to take care of spelling variation, which is very common in these languages. Our method shows an improvement in F-measure of up to 30% over scaled edit distance. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |