Dependency Parsing and Empty Category Detection in Hindi Language

Author: Puneeth Kukkadapu
Date: 2016-03-23
Report no: IIIT/TH/2016/9
Advisor:Dipti Misra Sharma

Abstract

Parsing Indian languages has always been a challenging task. In recent years there have been various approaches explored for improving parsing accuracy for Hindi and other Indian languages. In this work, we present our experiments to improve dependency parsing accuracy for Hindi language as part of COLING-MTPIL 2012 shared task. We explored three data driven parsers on grounds of large feature pool consisting of morphological, chunk and syntactic features. We tried with different parser config- urations by considering different parsing strategies, classifiers and feature templates. We explored the usage and adoption of the Turbo Parser for parsing Indian languages. In addition to Turbo parser we have also explored other data-driven parsers: Malt and MST. We have also experimented on getting the best out of these parsers by using two approaches. We selected the best configuration for each set of data and were able to produce the best average accuracy in the shared task. We achieved a best result of 96.50% unlabeled attachment score (UAS), 92.90% labeled accuracy (LA), 91.49% labeled attachment score (LAS) using voting method on data with gold POS tags. In case of data with automatic POS tags, we achieved a best result of 93.99% (UAS), 90.04% (LA) and 87.84% (LAS). The second part of this thesis focuses on using statistical dependency parsing technique to detect NULLs or Empty Categories in the sentences. In these experiments we have worked with Hindi de- pendency treebank. There were some rule based approaches tried out before to detect Empty heads for Hindi language but statistical learning for automatic prediction was not demonstrated. In this approach we used a technique of introducing complex labels into the data to predict Empty Categories in sentences.We have mapped the problem of Empty category prediction to data-driven parsing and explored various parsers to find the best one for this approach. The motivation comes from using data-driven parsing to solve other tasks in Natural Language Processing. The system was able to predict Empty categories with a decent F-score of 76.26. We have discussed about shortcomings and difficulties of this approach.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Dependency Parsing and Empty Category Detection in Hindi Language

Abstract