IIIT Hyderabad Publications |
|||||||||
|
Syllables for Sentence Classification in Morphologically Rich LanguagesAuthors: Madhuri Tummalapalli,Radhika Mamidi Conference: The 32nd Pacific Asia Conference on Language, Information and Computation (PACLIC-2018 2018) Location The Hong Kong Polytechnic University, Hong Kong SAR Date: 2018-12-01 Report no: IIIT/TR/2018/90 AbstractSentence Classification is one of the most fundamental tasks in NLP, where the aim is to classify a given sentence into a pre-defined set of classes. A lot of work has been done in English in the last few years, which vary in their methodologies. A huge proportion of these works represent the input sentences as a sequence of words in their models. Only a few of them rely on character level representation. Through this work, we introduce a new method for representing a sentence – as a sequence of syllables. As we show in this work, syllables are a better choice to represent the sub-word level information in a sentence, which is essential for morphologically rich languages. We consider the tasks of Sentiment Analysis and Question Classification in three languages showing varied morphological richness - English, Hindi and Telugu. Through extensive evaluation, we show that syllables are the best performing input type when compared to words or characters for the morphologically rich languages - Hindi and Telugu. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |