IIIT Hyderabad Publications |
|||||||||
|
Unity in Diversity: A unified parsing strategy for major Indian languagesAuthors: Juhi Tandon,Dipti Misra Sharma Conference: International Conference on Dependency Linguistics (Depling-2017 2017) Location Pisa, Italy Date: 2017-09-18 Report no: IIIT/TR/2017/118 AbstractThis paper presents our work to apply non linear neural network for parsing five r esource p oor I ndian L anguages belonging to two major language families-Indo-Aryan and Dravidian. Bengali and Marathi are Indo-Aryan languages whereas Kannada, Telugu and Malayalam belong to the Dravidian family. While little work has been done previously on Bengali and Telugu linear transition-based parsing, we present one of the first parsers for Marathi, Kannada and Malayalam. All the Indian languages are free word order and range from being moderate to very rich in morphology. Therefore in this work we propose the usage of linguistically motivated morphological features (suffix and postposition) in the non linear framework, to capture the intricacies of both the lan- guage families. We also capture chunk and gender, number, person information elegantly in this model. We put forward ways to represent these features cost effectively using monolingual distributed embeddings. Instead of relying on expensive morphological analyzers to extract the information, these embeddings are used effectively to increase parsing accuracies for resource poor languages. Our experiments provide a comparison between the two language families on the importance of varying morphological features. Part of speech aggers and chunkers for all languages are also built in the process. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |