IIIT Hyderabad Publications |
|||||||||
|
Applications of Data-Driven Dependency RulesAuthor: VARUN KUCHIBHOTLA 200702052 Date: 2024-07-04 Report no: IIIT/TH/2024/161 Advisor:Dipti Misra Sharma AbstractIn this paper, we present an approach to integrate unlexicalised grammatical features into Malt dependency parser. Malt parser is a lexicalised parser, and like every lexicalised parser, it is prone to data sparseness. We aim to address this problem by providing features from an unlexicalised parser. Contrary to lexicalised parsers, unlexicalised parsers are known for their robustness. We build a simple unlexicalised grammatical parser with POS tag sequences as grammar rules. We use the features from the grammatical parser as additional features to Malt. We achieved improvements of about 0.17-0.30 percent for UAS on both English and Hindi state-of-the-art Malt results. Word sketches are one-page automatic, corpus- based summaries of a word’s grammatical and collocational behaviour. These are widely used for studying a language and in lexicography. Sketch Engine is a leading corpus tool which takes as input a corpus and generates word sketches for the words of that language. It also generates a thesaurus and ‘sketch differences’, which specify similarities and differences between near-synonyms. In this paper, we present the functionalities of Sketch Engine for Hindi. We collected HindiWaC, a web crawled corpus for Hindi with 240 million words. We lemmatized, POS tagged the corpus and then loaded it into Sketch Engine. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |