Developing a Dialogue System for Telugu, a Resource Poor Language

Author: MULLAPUDI CHAITANYA SRAVANTHI 201002150
Date: 2022-07-01
Report no: IIIT/TH/2022/104
Advisor:Radhika Mamidi

Abstract

Dialogue systems help us to access information and services more effectively. This thesis presents the work done on developing Interactive Question Answering Systems, a type of dialogue systems, for English and Telugu. The domains chosen for English was Hyderabad MMTS enquiry and for Telugu it was Hyderabad Tourism. Both the systems take input in the form of text and the output is also text. To be able to study a wider range of user behaviour both the systems are developed for a different domain and language. The MMTS Question Answering system follow the frame based approach. This model can handle spelling errors and can also deal with insufficient information in the user query. In this system, the data is stored using SQL databases. We convert the user query to a proper SQL query and extract required information from the database. We tried to port the above model into Telugu language for Tourism domain but the process failed because of a lack of proper language processing tools for Telugu language. So we had to design a new approach. The approach used for Telugu can be extended to other resource poor Indian languages as well. The dialogue model consists of two parts namely Data Management and Query Processing. Data Management deals with storing the data in a particular format which helps in easy and quick retrieval of requested information. Query Processing deals with producing a relevant system response for a user query. Our model can handle code-mixed queries which are very common in Indian languages and also handles context which is a major challenge in dialogue systems. It also handles spelling mistakes and a few grammatical errors. The model is domain and language independent. Using this model, we developed a system for Telugu language for ’Tourist places of Hyderabad’ domain. As there is no automated evaluation tool available for dialogue systems, we went for human evaluation of our system. Five people evaluated our system and the results are reported in the thesis. In this thesis, we have also made an attempt to extract adjacency pairs from conversational units with multiple utterances from each user using dialogue acts. Adjacency pairs help in automatic extraction of dialogue structure. In dealing with minimal information in utterances, this becomes useful. In the machine learning algorithms that we were using for Natural language Generation, we relied on conversational units where the speakers have uttered only one sentence in their turn as training data, This restricts the data, and a good amount of manual work goes into converting these conversations units to adjacency pairs. To ease this process, we have tried to automate the process of extracting adjacency pairs from the data using dialogue acts.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Developing a Dialogue System for Telugu, a Resource Poor Language

Abstract