A Code-Mixed Dialog System in Medical Domain

Author: Suman Dowlaqar
Date: 2023-07-15
Report no: IIIT/TH/2023/129
Advisor:Radhika Mamidi

Abstract

In the healthcare domain, medical and patient interactions are crucial for diagnosis. Initially, AI models developed for healthcare centered only on monolingual data. However, such models do not cater to the multilingual regions, where most conversations are Code-Mixed. Our thesis concentrates on developing the Code-Mixed Medical Task-Oriented Dialog System. This task-oriented dialog system aims to help the user consult a medical specialist based on their symptoms. We train the dialog system on the code-mixed task-oriented dialog dataset ‘Su-Vaid’. The dataset contains 3005 Telugu-English Code-Mixed dialogues between patients and doctors with 29k utterances covering ten specializations with an average code-mixing index (CMI) of 33.3%. The major components of our dialog system are Natural Language Understanding (NLU), Dialog Manager (DM), and Natural Language Generation (NLG) modules. We manually annotated the conversational dataset with intent and slot labels. We also present baselines to establish benchmarks on the dataset using existing state-of-the-art Natural Language Understanding (NLU) models. We improved the existing baselines using contextual ground truth intent labels and processing the slots as chunks. We have used Rule-based Dialog Manager and template-based NLG module in our Dialog System. While interacting with the system, the user may fail to mention a few symptoms. In such a case, the dialog system must remind the user by suggesting relevant symptoms. We have incorporated a suggestion system into our dialog system. Apart from interacting with the user to recommend the medical specialist, the dialog system needs to be more human-like (empathetic). Also, human-human conversations in the healthcare domain are mostly empathetic. To achieve human-like behavior, we have included empathy via language accommodation in our dialog system. Also, interactions between patients and medical practitioners can sometimes be challenging. Health care workers are at greater risk of workplace abuse than most other workers, with nurses and family physicians rated as most at risk of abusive encounters with patients. Types of abusive encounters range in severity, from verbal threats to more extreme encounters, such as stalking and physical assault. Hate speech and physical abuse must be strongly condemned for a better function of health care. Such conversations must be identified by the system so that the system can take necessary action on such users. So we have included hate speech detection in our dialog system. Finally, we have evaluated all the major components of our dialog system using state-of-the-art baselines and classification metrics and we chose the top performing models to integrated them into the final version of the system.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

A Code-Mixed Dialog System in Medical Domain

Abstract