IIIT Hyderabad Publications
A Dialog Act Tagger for Telugu
Report no: IIIT/TH/2016/20
In a task oriented domain, recognizing the intention of a speaker is important so that the conversation can proceed in the correct direction. This is possible only if there is a way to label the utterance with its proper intent. One such labeling technique is Dialog Act(DA) tagging. The main goal of this thesis is to build a Dialog Act tagger for the Telugu corpus. This work focuses on discussing various n-gram DA tagging techniques so as to tag the Telugu data. The n-gram DA tagging methods proposed earlier for English will not work for free word order languages like Telugu as English language follows strict subject–verb–object(SVO) syntax. This thesis explains in detail about the two DA tagging methods for tagging free word order languages preferentially Telugu. In this thesis, we propose a method to perform DA tag- ging for the Telugu corpus using advanced machine learning techniques combined with karaka dependency relation modifiers. The use of karaka dependencies for free word order languages like Telugu helps in ex- tracting the modifier-modified relationships between words or word clusters for an utterance. The modifier-modified relationships remain fixed even though the word order in an utterance changes. These extracted modifier-modified relationships appear similar to n-grams. Later, statistical machine learning methods are applied to predict DA for an utterance in a dialog. The first method uses n-gram karakas with back-off as n-gram language modeling technique at n-gram level and Memory Based Learning at utterance level. In the second method, we use syntactic features such as anaphora resolution, conjunct identification and using modifier modified relationship rules we automatically extract n-gram karakas. Then we apply language modeling (LM) with Hidden Markov Model (HMM) method for DA tagging. The proposed methods are compared with several baseline tagging algorithms. The results show that the proposed methods perform better DA tagging for free word order languages like Telugu.
Full thesis: pdf
Centre for Language Technologies Research Centre
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved.