IIIT Hyderabad Publications |
|||||||||
|
Resolution of Pronominal Anaphora for Telugu DialoguesAuthor: HEMANTH REDDY JONNALAGADDA 201002100 Date: 2024-07-02 Report no: IIIT/TH/2024/154 Advisor:Radhika Mamidi AbstractThe challenge of anaphora resolution has been a prominent research topic within the field of natural language processing (NLP) for many years. Anaphora refers to the linguistic phenomenon where a word or phrase refers back to another word or phrase previously mentioned in the discourse. Resolving these references accurately is crucial for understanding and generating coherent text. While significant progress has been made in anaphora resolution for several languages, there is a notable gap in research focusing specifically on dialogues in Telugu, a Dravidian language spoken primarily in India. Telugu presents unique challenges for anaphora resolution due to its rich morphology and free word order. These characteristics complicate the identification of antecedents for pronominal references, which is essential for accurate language understanding. This thesis addresses these challenges by presenting a rulebased algorithm designed for pronominal anaphora resolution in Telugu human-to-human conversations. The proposed algorithm consists of two main components: the creation of a comprehensive knowledge base and the development of a set of rules for resolving anaphora. The knowledge base includes an extensive list of Telugu pronouns along with their morphological information, which is crucial for understanding the grammatical relationships within sentences. The rule-based component leverages this knowledge base to identify and resolve pronominal references accurately. A significant contribution of this research is the development of a new annotated corpus for Telugu dialogues. Due to the lack of existing resources, a new corpus was built, comprising 108 human-to-human conversations with a total of 509 pronouns. The corpus was annotated manually to ensure high-quality data for evaluating the algorithm. The performance of the algorithm was tested on this corpus, and the results demonstrate its effectiveness. The algorithm achieved an overall accuracy of 61.1%, with the highest accuracy observed for 1st person pronouns (81.88%) and the lowest for 3rd person pronouns (43.56%). These results highlight the complexities involved in resolving 3rd person pronouns in Telugu dialogues and suggest areas for further improvement. In conjunction with this research, we initially developed an interactive question answering system for the Hyderabad Multi-Modal Transport System (MMTS). Existing applications for querying train arrival times often rely on dropdown menus for selecting 'FROM Station' and 'TO Station,' which then display a comprehensive list of train schedules. These systems fall short in catering to users seeking specific time periods and additional contextual information. Our project addressed these limitations by creating a userfriendly interface that leverages advanced natural language processing techniques to interpret and respond to user queries. The system employs a frame-based approach, coupled with robust parsing and spellchecking mechanisms, to accurately handle user inputs, including those with spelling errors and code-mixed language. This innovative solution not only showcases our technical expertise in dialogue systems but also significantly enhances the commuter experience by providing precise and relevant train schedule information for Hyderabad MMTS users. This work represents a significant step towards improving natural language processing for Telugu, with potential applications in dialogue systems, text summarization, machine translation, and other NLP tasks. The findings of this research contribute to the broader field of anaphora resolution and provide a foundation for future studies focusing on Telugu and other morphologically rich languages. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |