IIIT Hyderabad Publications |
|||||||||
|
Transition-based Technique for Syntactic Linearization and Deep Input LinearizationAuthor: Ratish Puduppully Date: 2017-02-22 Report no: IIIT/TH/2017/19 Advisor:Manish Shrivastava AbstractTransition-based techniques were originally introduced for syntactic parsing. They have achieved the highest accuracies for both constituency and dependency parsing. In an earlier work, transitionbased technique was extended to the task of syntactic linearization. In this work, we improve accuracies for transition-based syntactic linearization and introduce transition-based technique for deep input linearization. We thus show the effectiveness of transition-based technique for linearization. Syntactic linearization is task of producing a linearized sentence along with its dependency tree given an input of bag of words and optional set of POS and dependency constraints. The task is important for text-to-text applications such as machine translation, summarization, dialog generation etc. We identify feature sparsity issue with state-of-the-art transition-based models for syntactic linearization. We propose a modification to the standard transition-based feature structure, which reduces feature sparsity and allows lookahead features at a small cost to decoding efficiency. Our model gives the best reported accuracies on all benchmarks, yet still being over 30 times faster compared with traditional best-first-search. Deep input linearization produces linearized sentence given an input of deep graph of set of lemmas. The deep input type is intended to be an abstract representation of the meaning of a sentence. Unlike semantic input, where the nodes are semantic representations of input, deep input is more surface centric, with lemmas for each word being connected by semantic labels. In contrast to shallow syntactic trees, function words in surface forms are not included in deep graphs. Deep inputs can more commonly occur as input of Natural Language Generation (NLG) systems where entities and content words are available, and one has to generate a grammatical sentence using them with only provision for inflections of words and introduction of function words. Such usecases include summarization, dialog generation etc. Traditional methods for deep NLG adopt pipeline approaches comprising stages such as constructing syntactic input, predicting function words, linearizing the syntactic input and generating the surface forms. Though easier to visualize, pipeline approaches suffer from error propagation. In addition, information available across modules cannot be leveraged by all modules. We construct a transition-based model to jointly perform linearization, function word prediction and morphological generation, which considerably improves upon the accuracy compared to a pipelined baseline system. On a standard deep input linearization shared task, our system achieves the best results reported so far. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |