IIIT Hyderabad Publications |
|||||||||
|
Tools for Linguistic and Semantic Resource DevelopmentAuthor: Grandhi Sai Venkata Harsha Vardhan 201002078 Date: 2024-07-04 Report no: IIIT/TH/2024/155 Advisor:Soma Paul AbstractThis thesis explores three innovative projects aimed at advancing linguistic and semantic resource development in natural language processing (NLP). The first project focuses on constructing a knowledge base tailored to user-oriented documents in the AC manual domain using the PurposeNet architecture. This initiative aims to systematically extract and organize domain-specific information from diverse textual sources, facilitating enhanced comprehension and retrieval of technical information crucial for user support and maintenance tasks. The second project addresses the automated conversion of Indian language morphological processors into Grammatical Framework (GF). Grammatical framework (GF) is an open source software which supports semantic abstraction and linguistic generalization in terms of abstract syntax in a multilingual environment. This makes the software very suitable for automatic multilingual translation using abstract syntax which can be treated as a interlingua. As a first step towards building multi-Indian language translation system using GF platform, we aim to develop an automatic converter which will convert morphological processors available in various formats for Indian languages into GF format. As part of project, we develop a deterministic automatic converter that converts LTtoolbox and ILMT morphological processors into GF format. The third project presents an Intelligent Interactive Editor designed to facilitate the creation of Controlled Natural Language (CNL) for Hindi text. CNL is a language-independent information system that captures accurate Syntactico-Semantic Representations of source languages, ensuring clarity and precision in language processing. The editor leverages multiple state-of-the-art tools and custombuilt tools to auto-populate most of the fields required in the CNL format, significantly easing the user’s workload. These tools are integrated using a plugin architecture, ensuring the editor remains future-proof and adaptable to new advancements. Additionally, the editor features a mapper layer that allows intermediate representations used by the tools to be mapped to the CNL format through various extendable rules, provided developers adhere to the required input-output format. CNL is a crucial foundational element for developing a robust multilingual natural language generation (NLG) tool that transcends the limitations of relying on a single source language. Together, these projects represent significant advancements in linguistic and semantic resource development for NLP applications. They provide robust tools and methodologies that contribute to foundational research in semantic representation and computational linguistics, offering practical solutions to improve the performance and applicability of NLP systems across diverse linguistic contexts. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |