Distributional Semantics and Neural Network based Improvements to Dependency Parsing

Author: Silpa Kanneganti
Date: 2017-07-27
Report no: IIIT/TH/2017/57
Advisor:Dipti Misra Sharma

Abstract

Natural language processing is a field of artificial intelligence and computational linguistics that aims at bridging the gap between human beings and computers. It deals with processing natural languages in various forms like speech and text. Processing a natural language happens at various levels i.e, word, phrase, sentence, syntax, semantics and discourse. In this work we present our efforts at making advancements at syntax and semantic levels in dependency parsing. Dependency parsing involves discovering relationships between words in a given sentence so as to understand it better. We begin this thesis with a comparative error analysis of two parsers - MALT and MST on Telugu Dependency Treebank data. We discuss the performance of both the parsers in relation to the Telugu language and then talk in detail about both the algorithmic issues of the parsers as well as the language specific constraints of Telugu. We follow this by proposing a semi-supervised approach by introducing clustered word embedding based features into dependency parsing. This involves a simple and effective semi-supervised method to introduce features that incorporate dependency label clusters derived from a large annotated corpus into data driven dependency parsing. We demonstrate the effectiveness of the approach in a series of experiments on Hindi language data. We then propose a neural network based classifier voting approach to dependency parsing using multiple classifiers as component systems in an ensemble and a neural network algorithm as an oracle. We show significant improvements over the best component systems for both transition-based and graph-based dependency parsing.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Distributional Semantics and Neural Network based Improvements to Dependency Parsing

Abstract