Data Augmentation For Probing Language Understanding Models

Author: Atreyee Ghosal 20161167
Date: 2022-12-29
Report no: IIIT/TH/2022/166
Advisor:Manish Shrivastava

Abstract

Deep learning models perform very well when it comes to inferencing on benchmark NLP tasks. Because they perform well on these tasks- tasks that, like the GLUE set, are designed to test the language understanding and reasoning skills of the model- we hypothesize that the models are, in fact, understanding and reasoning based on the evidence. In this dissertation, we test that hypothesis by designing a series of probes to test evidence-based reasoning in the context of the tabular NLI task. The particular models examined are BERT-based tabular NLI models representative of the state-of-the-art in tabular NLI. A problem faced in probe creation is the time and effort required to manually create probing datasets for multiple tasks. In this dissertation, we also detail the use of data augmentation techniques to create probing datasets. In total, we create datasets to probe for: (a) prioritization of evidence, (b) robustness to annotation artefacts, and (c) reliance on pre-trained world knowledge over presented evidence. We find that the class of models studied displays the following deviations from expected behaviour: it (a) ignores relevant parts of the evidence, (b) is over-sensitive to annotation artifacts, and (c) relies on the knowledge encoded in the pretrained language model rather than the evidence presented in its tabular inputs. Finally, through inoculation experiments, we show that fine-tuning the model on challenging data does not help it overcome the behavioural deficiencies.

Full thesis: pdf

Centre for Language Technologies Research Centre

IIIT Hyderabad Publications

Data Augmentation For Probing Language Understanding Models

Abstract