IIIT Hyderabad Publications |
|||||||||
|
Leveraging Tokens in a Natural Language Query for NLIDB SystemsAuthor: Ashish Palakurthi Date: 2017-04-25 Report no: IIIT/TH/2017/28 Advisor:Radhika Mamidi AbstractNatural Language Interface to Database (NLIDB) systems convert a Natural Language (NL) query into a Structured Query Language (SQL) and then use the SQL query to retrieve information from a database. The main advantage with this function of an NLIDB system is that it makes information retrieval much easier and more importantly, it also allows non-technical users to query a database. In this work, we present an effective usage of tokens in an NL query to address various problems in an NLIDB system. The conversion of an NL query to SQL query is framed as a token classification problem, using which we unveil a novel method of mapping an NL query to an SQL query. Our approach reasonably addresses domain dependency which is one of the major drawbacks of NLIDB systems. Concepts Identification is a major component of NLIDB systems (Gupta et al., 2012; Srirampur et al., 2014). We improve Concepts Identification (Srirampur et al., 2014) by making use of Stanford Parser Dependencies. Our approach is more robust than previously proposed methods. In addition, we also discuss how Concepts Identification can be applied to address Ellipsis in a dialogue process. In addition to providing results to a user, it is essential to provide a relevant and a compact set of results. We propose a new problem of generating a compact set of results to a user query. At a higher level, user-system interactions are modeled based on patterns frequently observed between a user’s current query and his previous queries, while interacting with a system. Using these models, we propose a novel method of system prompting to help a user obtain a smaller and a relevant set of results. In addition to providing a compact and a relevant set of results, it is imperative that answers of an NLIDB system are comprehensible even by the people who are less familiar with a common language like English. NLIDB systems use Natural Language Generation (NLG) modules to provide answers in the form of sentences. It is important to make sure that an answer generated by an NLG module is very simple to understand. This brings in the problem of text simplification, wherein, one of the most crucial and initial sub-problems is Complex Word Identification (CWI). We address the problem of CWI by distinguishing words as simple and complex. A plethora of classifiers were explored to identify the complex words. This information helps us in improving the simplification of an NLIDB system’s final output to a user. To summarize, in addition to addressing problems within an NLIDB system, this work touches postNLIDB problems like results processing. All the proposed issues are tackled using tokens in an NL query as the basic and a driving unit of force. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |