IIIT Hyderabad Publications |
|||||||||
|
Towards better Sentence Classification for Morphologically Rich LanguagesAuthors: Madhuri Tummalapalli,Manoj Chinnakotla,Radhika Mamidi Conference: 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing-2018 2018) Location Hanoi, Vietnam Date: 2018-03-18 Report no: IIIT/TR/2018/94 AbstractMany methods have been developed for various sentence classification tasks for English, which usually exploit linguistic resources like parsers or rely on the large amount of annotated or unannotated data, making it difficult to adapt them to other languages. In this paper, we present an evaluation of popular deep learning methods for sentence classification on the morphologically rich Indian languages, specifically, Hindi and Telugu. For this purpose, we also created a question classification dataset for Hindi, by translating the TREC-UIUC dataset. We show that character based input can enhance the performance of current classification systems for morphologically rich languages. Finally, we show that our multiInput-CNN variant is able to perform better than our baselines in two out of three tasks in Hindi and Telugu, while giving comparable results for others. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |