IIIT Hyderabad Publications |
|||||||||
|
Towards detection and explanation of factual inconsistenciesAuthor: Tathagata Raha Date: 2024-05-16 Report no: IIIT/TH/2024/68 Advisor:Vasudeva Varma AbstractFactual inconsistencies in text, which include a range of errors from minor inaccuracies to substantial distortions, present a significant challenge in the realm of information dissemination. These inconsistencies, whether unintentional or the result of deliberate misinformation, can lead to a skewed understanding and flawed decision-making. In the context of the vast and complex landscape of digital data, the limitation of traditional verification methods becomes evident, highlighting the need for more advanced solutions. In this thesis, we present and explore three critical problems in this domain, each addressing a unique aspect of content credibility and factual consistency. The first problem we tackled in this thesis is the detection of hostility in online content, specifically focusing on Hindi tweets. Our approach categorizes these tweets into distinct hostile classes: hateful, offensive, defamatory, or fake. We employed pretrained Transformer-based models, particularly IndicBERT, which is adept at processing Hindi text due to its training on a vast corpus of Indian languages. The architecture of our model effectively utilizes information from emojis and hashtags, in addition to the natural language text. A significant enhancement in performance was achieved through Task Adaptive Pretraining (TAPT), leading to increases of 1.35% and 1.40% in binary hostility detection, and improvements of 4.06% and 1.05% in macro and weighted F1 metrics, respectively, for fine-grained classifications. Notably, our system, under the team name ‘iREL IIIT’, achieved first place in the ’Hostile Post Detection in Hindi’ shared task at the CONSTRAINT-2021 workshop. The second problem addressed in this thesis delves into the realm of automated fact extraction and verification, a pressing challenge in the digital landscape rife with misinformation. Central to our approach is the innovative Fact Extraction and Verification (FEVER) project, which assesses the veracity of claims against a comprehensive body of evidence from Wikipedia. Our contributions include the development of a specialized retrieval model tailored for the FEVER dataset, a strategic approach to sentence selection for optimal evidence gathering, and an exploration of advanced natural language inference (NLI) models, particularly state-of-the-art transformer models. By integrating these components, we not only refine the process of recognizing textual entailment but also significantly enhance the accuracy and efficiency of automated fact-checking. In tackling the third problem of this thesis, we address the critical issue of detecting and explaining factual inconsistencies in text, a significant challenge in the era of advanced Transformer-based natural language generation models. These models, while adept in tasks like summarization and translation, often struggle with producing hallucinatory and inconsistent content. Our approach introduces the novel Factual Inconsistency Classification with Explanations (FICLE) method. This technique involves a detailed analysis of sentence pairs to identify inconsistency types and provide comprehensive explanations, including inconsistent fact triples, context spans, and entity types. Central to our approach is the creation of the FICLE dataset, extensively annotated to cover a range of inconsistency types and their explanations. Utilizing this dataset, we developed a pipeline comprising four neural models, each designed to focus on specific facets of inconsistency detection and explanation. These models utilize various Transformer-based NLU and NLG architectures, with DeBERTa showing notable effectiveness across most sub-tasks. Our results underscore the effectiveness of this approach, demonstrating high performance in inconsistency type classification and entity-type prediction, with weighted F1 scores of around 87% and 86%, respectively, for these tasks. The detection of context spans, while more challenging, achieved an Intersection over Union (IoU) of approximately 65%, indicating the nuanced complexity of this aspect. The contributions of this research are multi-fold: proposing a new problem in factual inconsistency detection, creating a novel dataset, and establishing a baseline pipeline with high performance in inconsistency-type classification and entity-type prediction. Full thesis: pdf Centre for Language Technologies Research Centre |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |