IIIT Hyderabad Publications |
|||||||||
|
Semi-automated annotated treebank construction for Hindi and UrduAuthors: Jayendra Rakesh Yeka,Ramagurumurthy Vishnu,Dipti Misra Sharma Conference: LREC-2014: 2nd Workshop on Indian Language Data: Resources and Evaluation (WILD RE) (LREC-2014: WILD RE Workshop 2014) Date: 2014-05-27 Report no: IIIT/TR/2014/63 AbstractIn this paper, we speak about the structure and paradigms chosen for creation of the annotated corpora for Hindi and Urdu. We briefly talk about the Shakti Standard Format that was chosen to suit needs of Indian language dependency annotation. This paper aims to present a framework for the creation of annotated corpus. We proceed to discuss the methods of automation chosen to overcome the laborious and time-consuming process of corpora annotation. We present the methods chosen to overcome the errors and multiple analyses that result through the task of annotation. We also present various methods used, both manual and automated, to ensure the quality of the treebank. We finally report the current status of the annotated corpora. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |