Towards Building a Domain Independent Dialog System

Author: Prathyusha Jwalapuram
Date: 2018-04-04
Report no: IIIT/TH/2018/3
Advisor:Radhika Mamidi

Abstract

This thesis discusses a mixed-initiative, domain independent dialog system based on a hierarchically structured knowledge base. The system is rule-based and uses dependency relations and part-of-speech tags obtained from the Stanford Parser coupled with the hierarchical structure of the knowledge base to identify the user’s goal. Primarily, the system is able to accept multi-sentence inputs, that is, multiple sentences forming a query in a single user turn. The system was tested for its accuracy over answering questions, and also subjective testing was done to evaluate the dialog flow; mainly over the books domain. We show examples of the system developed over the domains of books, movies and restaurants to demonstrate the domain independence. We also discuss in detail the simple, rule-based system of extracting relevant information from a user’s query that contributes to the domain independent dialog system. Relevant information is extracted from a user query by using only dependency relations and POS tags, obtained from the Stanford Dependency Parser. Using the universal dependency tags provided by the parser, we tried to understand the semantic structure of a query. This is done by looking for semantically important dependents of the verb, such as the subject, direct object, prepositions and their objects, and so on. Using information obtained from a combination of the dependency relations and the inherent semantic implications of words (such as ‘who’ or ‘where’), we tried to extract the main objective or keyword of the query, and the constraints pertaining to it. This implementation itself is domain independent, however, a mapping to a knowledge base would require some domain knowledge; this issue is resolved using the hierarchically structured knowledge base. This keyword and constraint identification system is tested on the course management domain and the library domain. In order to further improve the domain independent keyword identification for natural language queries, we used statistical methods. We took queries supplemented by only their dependency tags (Stanford Parser) and part-of-speech tags (Stanford POS tagger) and labeled the keywords. We then delexicalised the training data, and used the Conditional Random Fields algorithm to learn these labels. We used the queries created by [45] in the course management domain for training, and tested our model on the queries of three domains: course management, library and the G EO Q UERIES 250 dataset and report fairly high accuracies of 90.65%, 83.19% and 97.13% respectively, making our model a truly domain independent and highly accurate keyword identifier. There is no agreed upon standard for the evaluation of conversational dialog systems, which are well-known to be hard to evaluate. The difficulty lies in pinning down metrics that will correspond to human judgments and the subjective nature of human judgment itself. We explored the possibility of using Grice’s Maxims to evaluate effective communication in conversation. We collected a few system generated dialogs from popular conversational chatbots across the spectrum and conducted a survey to see how the human judgements based on Gricean maxims correlate, and if such human judgments can be used as an effective evaluation metric for dialogs.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Towards Building a Domain Independent Dialog System

Abstract