IIIT Hyderabad Publications |
|||||||||
|
Kathaa : A Visual Programming Framework for Natural Language Processing SystemsAuthor: spmohanty Date: 2017-03-27 Report no: IIIT/TH/2017/14 Advisor:Dipti Misra Sharma AbstractI proudly present Kathaa, an Open Source web-based Visual Programming Framework for Natural Language Processing (NLP) Systems. Kathaa supports the design, execution and analysis of complex NLP systems by choosing and visually connecting NLP components from an already available (and easily extensible) Module library. It models NLP systems as a edgelabeled Directed Acyclic MultiGraph (of optionally parallalized information flow), and lets the user choose and use publicly co-created modules in their own NLP applications irrespective of their technical proficiency in Natural Language Processing or in handling complex software systems. Kathaa exposes an intuitive web based Interface for the users to interact with and modify complex NLP Systems; and a precise Module definition API to allow easy and dynamic integration of new state of art NLP components (along with packing their associated services as docker containers). Kathaa enables researches to publish their services in a standardized format for the easy use of everyone else who may or may not understand the intricacies of their research but still really wants to use it and play with it, right out of the box. In more simpler words, the goal is to be able to make the basic primitives of Natural Language Processing accessible to each and everyone, and for everyone to be able to easily use, play with and adapt them. Research in Natural Language Processing has applications in a whole lot of domains. But while we as NLP Researchers happily celebrate the impact our research has, we do not actively make an effort towards making the life easier for someone without any background in Natural Language Processing, especially the rather large section of users who might want to just reuse some of our ideas and efforts in their own domain, for their own custom problems. With Kathaa, we aim to bridge this gap. One particularly strong motivation behind the effort that went into this work, is also to enable other fellow researchers to easily replicate and reuse our research without having to go through the usual technical hiccups that are associated with problems of this scale, complexity and magnitude. Also, somewhere deep down, all of us secretly wish the learning curve associated with even the simplest of the ideas in NLP was much lower, and Kathaa by design and its initial motivations also solves this exact problem by making it ridiculously easy to interact and play with rather complex NLP systems. With a little bit of creativity, the user can very easily mash up these ideas in new ways to come up with creative applications of the state of art of Natural Language Processing Research. The vision of this thesis is to pave the way for a system like Kathaa, to be the Lego blocks of Natural Language Processing Research and Applications. With Kathaa, I hope to inspire people from completely different backgrounds, motivations and curiosities to be able to play with the state of art results from Natural Language Processing research without having to indirectly prove a certain level of technical proficiency before being able to do anything productive with the systems. To help facilitate this ambition, in Kathaa, we very clearly separate the design and implementation layers of Natural Language Processing systems, and efficiently divide and pack every NLP component into consistent and reusable black-boxes which can be made to interface with each other through a consistent and robust interface, irrespective of the software environment in which these components reside. We also provide a really intuitive and visual interface to manipulate the interactions between the said components, and then go on to demonstrate how NLP components ranging from very simple to very complex in their design, can easily be included in Kathaa. As a practical use case, we visually implement the Sampark Hindi-Panjabi Machine Translation Pipeline and the Sampark Hindi-Urdu Machine Translation Pipeline in Kathaa, to demonstrate the fact that Kathaa can also handle really complex NLP systems while still choosing to be intuitive and abstract for the end user. With this thesis, I personally call out to all NLP researches to publish their research through Kathaa and help create an environment where every curious kid and every curious mind can comfortably play with our ideas around Natural Language Processing and adapt them to their heart’s content without the unnecessary burden of having to work their way through silly technical hurdles. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |