IIIT Hyderabad Publications |
|||||||||
|
Generating Converters between Fonts Semi-automaticallyAuthors: Akshar Bharati,Nisha Sangal,Vineet Chaitanya,Rajeev Sangal,G Uma Maheshwara Rao Conference: Proc. of SAARC conference on Multi-lingual and Multi-media Information Technology, CDAC, Pune, 1-4 Sept. 1998b, Date: 1998-10-01 Report no: IIIT/TR/1998/2 AbstractIt is important for us to be able to view as well perform search and other operations on texts in Indian languages available over the world wide web (or floppies or CDs), independent of the hardware or software platform. There are problems in doing this because currently most of the sites on Indian language texts are not following any coding standards. While the long term answer might be for everyone to switch over to a standard alphabetic coding scheme (ACII), some tools have been developed for immediate use that allow texts in glyph coding to be converted to the standard, automatically or semi-automatically. In this paper, a system is described which takes: (i) a text in an unknown coding scheme, and (ii) the same text in the ACII coding scheme, and generates a converter between the given unknown coding scheme and ACII. The converter can be used to convert a text from the non-standard coding scheme to ACII, and back. They can be progressively refined, manually. For generating the converter, a glyph-grammar for the script of the language is also needed, which specifies what possible glyph sequences make up an akshara. The grammar is independent of the coding schemes, and is structured. It needs to be developed only once, for a script. A sample glyph grammar for Devanagari has been described. A system has been implemented for Devanagari script, using which converters for at least two sample scripts have been generated semi-automatically. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |