IIIT Hyderabad Publications |
|||||||||
|
Semisupervised Data Driven Word Sense Disambiguation for Resource-poor LanguagesAuthors: Pratibha Rani,Vikram Pudi,Dipti Misra Sharma Conference: 14th International Conference on Natural Language Processing (ICON 2017) (ICON-2017 2017) Date: 2017-12-18 Report no: IIIT/TR/2017/108 AbstractIn this paper, we present a generic semi-supervised Word Sense Disambiguation (WSD) method. Existing WSD methods extensively use domain resources and linguistic knowledge. Our proposed method extracts context based lists from a small sense-tagged and untagged training data without using domain knowledge. Experiments on Hindi and Marathi Tourism and Health domains show that it gives good performance without using any language specific linguistic information except the sense IDs present in the sense-tagged training set and works well even with small training data by handling the data sparsity issue. Other advantages are that domain expertise is not needed for crafting and selecting features to build the WSD model and it can handle the problem of non availability of matching contexts in sense-tagged training set. It also finds sense IDs of those test words which are not present in sense-tagged training set but their associated sense IDs are present. This feature can help human annotators while preparing sense-tagged corpus for a language by suggesting them probable senses of unknown words. These properties make the method generic and especially suitable for resource-poor languages and it can be used for various languages without requiring a large sense-tagged corpus. Full paper: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |