Two semantic features make all the difference in Parsing accuracy.

Authors: Akshar Bharati,Samar Husain,Bharat Ambati,Sambhav Jain,Dipti M Sharma,Rajeev Sangal
Conference: In Proceedings of the 6th International Conference on Natural Language Processing (ICON-08), CDAC Pune, India. 2008.

Date: 2008-11-18
Report no: IIIT/TR/2008/138

Abstract

The paper describes experiments on a Hindi dependency treebank to systematically investigate crucial learning issues which crop up in building a robust Hindi parser. We do this by training two data-driven dependency parsers on the treebank. We test out various conjectures through these experiments. The results obtained either validate or make us to reframe the conjectures posed. The whole process helps in systematically isolating information crucial for parsing. Many interesting facts, such as how certain intuitive features fail to increase the performance of the parsers, what kind of linguistic phenomena are difficult to learn, how minimal semantics can help in identifying some core relations, etc. accrue from these experiments. The final performance obtained for parsing Hindi is encouraging, the best labeled attachment and unlabelled attachment scores are 69.64% and 88.67% respectively on a Treebank as small as 1200 sentences.

Full paper: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Two semantic features make all the difference in Parsing accuracy.

Abstract