IIIT Hyderabad Publications |
|||||||||
|
Building a Free Infra-structure for South-Asian LanguagesAuthors: Akshar Bharati,Vineet Chaitanya,Amba P Kulkarni,Rajeev Sangal Conference: Proc. of SAARC conference on Multi-lingual and Multi-media Information Technology, CDAC, Pune, 1-4 Sept. 1998a, (Keynote lecture) Date: 1998-09-04 Report no: IIIT/TR/1998/3 AbstractIt does not need to be stressed that it is important to provide knowledge and information in electronic form in South Asian (SA) languages. This task requires the development of software for searching texts, script conversion, dictionaries, spelling checkers, multi-lingual access software, etc., and of course, a rich collection of texts in electronic form. All this can be called the infra-structure for language. There are a number of problems which need to be addressed. (1) Very few word processors are following any standards regarding coding schemes while entering texts in SA languages. This renders the texts unusable across platforms. Even if another user has the right platform, the only thing he can usually do is to view the text. Normally he cannot even annotate the text using the keyboard. While the long term solution is for everybody to follow the ACII standard; in the short term, there is a need to develop code converters rapidly. This task has been automated to a large extent for Devanagari. The same should be done for other scripts. (2) The technical feasibility of multi-lingual access software has been demonstrated. (Though the machine translation technology is far away.) Anusaaraka systems for accessing texts in five SA languages are under development, and alpha-version for some have been released. This task can be taken up at a wider scale covering all SA languages. The systems already built can also be refined further. (3) Electronic texts and resources such as dictionaries, thesauri, lexical databases, are urgently needed. These can be prepared by the collective effort of myriads of people. In this paper, we argue that the SA language infra-structure can best be developed through a large cooperative effort. The GPL Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |