IIIT Hyderabad Publications
Part-of-Speech Tagging for Code mixed English-Telugu Social media data
Authors: Kovida Nelakuditi,jittadivya.sai ,Radhika Mamidi
Conference: 17th International Conference on Intelligent Text Processing and Computational Linguistics
Report no: IIIT/TR/2016/39
Part-of-Speech Tagging is a primary and an important step for many Natural Language Processing Applications. POS taggers have reported high accuracies on grammatically correct monolingual data. This paper reports work on annotating code mixed English-Telugu data collected from social media site Facebook and creating automatic POS Taggers for this corpus. POS tagging is considered as a classiffication problem and we use different classiffers like Linear SVMs, CRFs, Multinomial Bayes with different combinations of features which capture both context of the word and its internal structure. We also report our work on experimenting with combining monolingual POS taggers for POS tagging of this code mixed English-Telugu data.
Full paper: pdf
Centre for Language Technologies Research Centre
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.