Interpretation of Nominal Compounds Using Structured Knowledgebases

Author: Sruti Rallapalli
Date: 2017-12-02
Report no: IIIT/TH/2017/98
Advisor:Soma Paul

Abstract

This thesis investigates different approaches proposed in literature to identify the semantic relations in nominal compounds. Nominal compounds have received a great deal of attention in the last decade,owing to two main reasons - the increasing frequency with which nominal compounds occur in language, and the challenges they pose in most natural language processing applications like information extraction and retrieval, document summarization, machine translation and the like. Interpreting the semantics of nominal compounds in particular has been a popular research topic in recent times for reasons beyond just the impactful presence of compounds, like the challenges in explicating the semantic relation between constituents of the compound which is not-so-obvious to a machine as it is to a native speaker of the language. Most of the approaches proposed prior to this work are predominantly statistical techniques or treat compound interpretation as a classification task using a standard set of semantic relations as the prediction labels. The drawbacks of purely statistical and computational approaches is that they often a require large training dataset or corpus with a good distribution across all types of semantic relations, and often have no underlying lexical knowledgebases to support any data sparsity issues that may arise. Some of these approaches even involve preprocessing of large and unstructured databases like dictionaries and corpora which is inefficient and time-consuming. I motivate the need for exploring a new school of thought, quite opposite to the approaches that use computations backed up by knowledgebases . I explore in this thesis, the creation of hybrid systems that mine information from lexical and semantic knowledge resources like ConceptNet and WordNet, and are sufficiently backed by computational measures as necessary, to handle compounds outside the scope of these resources. Such systems overcome the obvious drawbacks of purely statistical and supervised learning tasks. I discuss two popular generic-domain ontologies called WordNet and ConceptNet in terms of the representational schema adopted in these ontologies to capture nominal compounds, and from the perspective of using these ontologies for labeling compounds with semantic relations. Most frequent issues encountered with these ontologies were identified as the lack of sufficient semantic information for certain nominal compounds, and the incompleteness of these ontologies in terms of capturing all the different categories of nominal compounds using a generic representational schema. I propose a new representation schema that identifies uniqueness of compounds and build a well structured purpose-centric ontology called PurposeNet centered around this representational schema. The ontology-search approach proposed in this thesis uses simple lookup techniques on indexed and preprocessed PurposeNet to derive the semantic relation between head and modifier in a nominal compound. The same ontology-search was then tailored to suit the representation used in WordNet and ConceptNet too. Comparison of results of the ontology search on all the three ontologies indicates that the best performance was on PurposeNet. While the results are definitely motivating, the ontology-search approach is not robust enough to handle different types of nominal compound constructions in the generic domain. I show that combining this ontology-search approach with semantic relatedness measures based on WordNet makes our compound interpretation system robust due to collation of semantic information from multiple resources. This hybrid system achieves state-of-the-art results on our standard test set of compounds. Lastly, I also study the feasibility of using the same approach based on PurposeNet and English WordNet as is, to label nominal compounds in other languages which have meager knowledge resources. I choose to extend this approach for nominal compounds in Hindi, by translating compounds from Hindi to English using different lexical resources like the Hindi WordNet 1 and IITB Hindi-Universal Word dictionary 2 and then running them through our hybrid compound interpretation system. While the results are not very encouraging due to various issues like change in the structure of compound after translation or lack of coverage for borrowed words in the lexical resources used, they nevertheless indicate that this approach is re-usable for interpreting compounds from other languages, with the help of context-based translation systems that take into consideration the context and structure of the nominal compound in its source language.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Interpretation of Nominal Compounds Using Structured Knowledgebases

Abstract