IIIT Hyderabad Publications
“TOWARDS BUILDING A LEXICAL ON- TOLOGY RESOURCE BASED ON INTRINSIC SENSES OF WORDS”
Report no: IIIT/TH/2016/37
Relation between reality and language is an age old problem. The term ‘meaning’ embodies hordes of issues which are found factored into the problem. One contemporary aspect of the problem is how to computationally specify ‘meaning’ of a word. To conceive a web resource for the ‘meaning’ of a word involves ontological as well as linguistic considerations. Usually ‘meaning’ of a word is specified using other words like in dictionaries, thesauruses etc. Ontologically ‘meaning’ of a word is thought to be specified in terms of it participating in classes, relations and events. Both these accounts seem extensional as they refer to objects outside a word to determine its ‘meaning’. For both of these accounts words ‘mean’ only in the company of other words or in association with the outlying reality which words represent. We take a view that ‘meaning’ of a single word has a necessary aspect of intension. This intension is in situ formations embodied in the ‘meaning’ of a word. The in situ forms are present intrinsically in the ‘meaning’ of words and are the causes for the association of words as well as assertion of knowledge. The work presents a new approach to formal ontology of natural language based on in situ formations in the ‘meanings’ of words and pairings of words. The basic motivation is to computationally manipu- late language at the level of intrinsic lexical meanings. Lexical meanings, in the transaction of language, have discrete intrinsic forms of types and classes. Intrinsic sense-types and sense-classes have been iden- tified for 3867 verbs,1980 adverbs and 300 adjectives. Identification of sense-types and sense-classes for adjectives and nouns is in progress. These types and classes are unambiguously locatable in parts of speech through collective introspective inquiry first and then enriched with the help of computational methods of corpus study. This work reports on the construction of a web resource, in which ‘meanings’ of English words are given in terms of formal ontology of language inspired by Leibniz, Patanjali and Bhartrihari. The experimental results on testing data of SemEval - 2010 (word sense induction and disambiguation task), the measure of synonimity of verbs and nouns and the inferences drawn from the sense-class and sense-type distributions show the potentiality of the resource. Integration of resource of sense-types and sense-classes with lexical grammar of cases, interpreted as verb-noun pairs, leads to paraphrasing of English sentences into graphs of formal senses. This gives confidence that the resource can be enriched through these studies and resource exploration framework can be made such that as a resultant resource itself is continuously enriched. The proposed ontology of verbs, adverbs, adjectives and nouns, if can be implemented for large numbers of words, would make the resource adventitious in knowledge computing. We aim at extensive lexical coverage of language to show that our approach is non-reductionist unlike semantic prime approaches. Rigorous formal underpinning of the ontology resource will enhance capability, possibility and accuracy of ontology driven NLP and AI applications. It is a growing resource and as yet it covers only portion of English language. However, the resource framework is designed to include resource enriching tasks and revisionary studies.
Full thesis: pdf
Centre for Exact Humanities
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.