IIIT Hyderabad Publications |
|||||||||
|
Towards Consistent and Informative Timeline GenerationAuthor: Priyank Modi Date: 2024-07-06 Report no: IIIT/TH/2024/156 Advisor:Manish Shrivastava AbstractUnderstanding temporal relationship between events form an important part of document analysis and help in many downstream tasks, like Question Answering (QA) and Timeline Generation. Timeline generation serves as a cornerstone in NLP as it enables the reconstruction and visualization of the chronological sequence of events within textual data. Effective timeline generation serves as a foundational step toward coherent document understanding. However, existing temporal relation extraction models face notable limitations due to issues prevalent in annotated datasets. These include ambiguous annotation guidelines leading to low interannotator agreement, the omission of long-distance relations across document sections, and a narrow focus solely on verb-centered events. In response, this thesis introduces a pioneering approach aimed at creating a comprehensive discourse-level temporal event ordering dataset. Central to our methodology is the transformation of relation classification between event pairs from a local context to the creation of discourse-level timelines. Our novel approach incorporates the concept of multiple timelines within a discourse, distinguishing between the actual timeline containing realized events and hypothetical timelines housing potential occurrences. To facilitate this innovation, we developed DELTA 2.0 (DiscoursE Level event Timeline Annotation) by re-annotating the TimeBank-Dense dataset. This effort resulted in a substantial increase in the inter-annotator agreement score and the number of meaningful relations captured, surpassing the scope of existing datasets focused solely on local temporal relations. Furthermore, to streamline and enhance the efficiency of timeline annotation, we devised a user-friendly annotation tool. Employing this annotated dataset, a publicly available adaptedversion of the state-of-the-art model, TIMERS, was trained, showcasing its effectiveness in predicting discourse-level event temporal relations. Additionally, through a comprehensive reproducibility study, we evaluated leading-edge models in the domain of event-event temporal relation classification. Leveraging advanced language model-based architectures such as BERT and RoBERTa, our analysis revealed promising outcomes. Moreover, integrating Graph Neural Networks into our methodologies enhanced the representation of temporal information, substantially improving the accuracy of generated timelines. This work signifies a transformative shift towards discourse-level analysis in temporal relation extraction from news articles. By establishing DELTA 2.0 and integrating sophisticated neural network methodologies, this research lays a robust foundation for advancing the understanding of temporal relations in natural language processing. Such advancements have the potential to significantly impact document understanding and a wide array of practical applications in the field. Finally, we extend our work to indic languages via a TimeML compliant Hindi Timebank. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |