Word Sense Disambiguation Using Semantic Categories, Domain Information and Knowledge Sources

Author: Siva Reddy
Date: 2010-09-14
Report no: IIIT/TH/2010/36
Advisor:Rajeev Sangal

Abstract

Words can have more than one distinct meaning and many words can be interpreted in multiple ways depending on the context in which they occur. This phenomena poses challenges to Natural Language Processing systems. State-of-the-art methods to resolve word ambiguity make use of manually annotated data. However, obtaining such data is costly for certain languages and domains. In this thesis we have developed word sense disambiguation (WSD) methods for languages and domains where no annotated data is available. We have proposed unsupervised corpus based methods for Semantic Category Labeling, a task very similar to assigning coarse grained WSD and hence relatively easier than traditional WSD. Methods that rely on lexical knowledge base (WordNet) are also evaluated. Furthermore, we have developed methods for domain-specific WSD and evaluated our performance on “domain specific WSD of all words” task which is a part of ACL SemEval- 2010. The results reveal the importance of domain information in domain-specific WSD. Very little work has been reported on integrating various knowledge sources for WSD. Current methods for WSD take advantage of only a few knowledge sources and do not use them collectively. We propose a novel framework which can model information from various knowledge sources into constraints and collectively use them for disambiguation. Our initial experimental results are competitive with that of state-of-the-art methods.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Word Sense Disambiguation Using Semantic Categories, Domain Information and Knowledge Sources

Abstract