IIIT Hyderabad Publications |
|||||||||
|
Named Entity Recognition for Hindi-English Code-Mixed Social Media TextAuthors: Vinay Singh,Deepanshu Vijay,Syed S. Akhtar,Manish Shrivastava Conference: 56th Annual Meeting of the Association for Computational Linguistics (NEWS 2018 (ACL-2018) 2018) Location Melbourne, Australia Date: 2018-07-15 Report no: IIIT/TR/2018/52 AbstractNamed Entity Recognition (NER) is a major task in the field of Natural Language Processing (NLP), and also is a subtask of Information Extraction. The challenge of NER for tweets lies in the insufficient information available in a tweet. There has been a significant amount of work done related to entity extraction, but only for resource-rich languages and domains such as the newswire. Entity extraction is, in general, a challenging task for such an informal text, and code-mixed text further complicates the process with it’s unstructured and incomplete information. We propose experiments with different machine learning classification algorithms with word, character and lexical features. The algorithms we experimented with are Decision tree, Long Short-Term Memory (LSTM), and Conditional Random Field (CRF). In this paper, we present a corpus for NER in Hindi-English Code-Mixed along with extensive experiments on our machine learning models which achieved the best f1-score of 0.95 with both CRF and LSTM. Full paper: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |