Multilingual spoken dialog systems for handheld devices

Author: Brij Mohan lal
Date: 2017-06-24
Report no: IIIT/TH/2017/37
Advisor:Manish Shrivastava

Abstract

Technological advancements have made human beings dependent on machines in unprecedented ways. High precision in many tasks and a vast, tractable memory confirms their presence as an integral part of human lifestyle. Human-machine interaction, then, is the evident next step in developing smarter systems and consequently, a research topic of foremost importance. Effective communication is achieved by means of natural language and hence, language becomes an obvious necessity for human-machine communication too. The vast majority of languages spoken in a multi-ethnic society mandates the necessity of development of multilingual systems. India is one such region which is inhabited by highly diverse multicultural societies and a multitude of ethnic groups living as a nation. An effective Spoken Dialog System (SDS) developed for Indian scenario has to keep in mind the shared characteristics of languages spoken across India. This motivates the dire need to develop multilingual human-machine interfaces which can exploit this commonality. Portability and ever-increasing computing capacity have rendered mobile phones to be the most common form of communication agents. But in absence of network connectivity, they lose a lot of power residing in cloud platforms and have to rely on local resources which are limited and scarce in nature. This problem is ubiquitous because there are several locations and situations when network is not available but basic communication is crucial for information retrieval. One such situation is Healthcare for preventable diseases in rural areas. This thesis is focused to alleviate some of the problems which arise when humans move to interact with hand-held conversational machine agents in a multilingual setting such as India. With the advent of social media, code-mixing became a popular phenomena in spoken language. This work progresses to develop a spoken dialog system for the domain of Healthcare and focuses on problems of 1. recognizing the language of communication within speech segments through large-scale spoken language modeling and 2. spotting keywords in speech signal by creating a robust representation of articulatory gestures in speech. The proposed approaches are based on recurrent and convolutional architectures of neural networks and have been experimentally shown to outperform the state-of-the-art on standard datasets.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Multilingual spoken dialog systems for handheld devices

Abstract