IIIT Hyderabad Publications |
|||||||||
|
Evaluating and Enhancing the Robustness of Math Word Problem SolversAuthor: Vivek Kumar 2019701004 Date: 2023-06-21 Report no: IIIT/TH/2023/98 Advisor:Vikram Pudi AbstractIn the recent past, math word problem solvers have received wide attention from the NLP community at large. With the advancement of deep learning techniques, solvers have started to show better performance than traditional rule based semantic parsing techniques. Standard accuracy metrics have shown that math word problem solvers have achieved high performance on benchmark datasets. However, these performances are based on datasets that have limited problem statements and equation templates, thus providing very limited diversity and a low probability of generalization on different word problems for practical purposes. Hence, the extent to which existing MWP solvers truly understand natural language problem statements and its relationship with numerical quantities is still unclear. In this work, we first generate adversarial attacks to evaluate the robustness of state-of-the-art MWP solvers. We propose two methods Question Reordering and Sentence Paraphrasing to generate adversarial attacks. We conduct experiments across three neural MWP solvers over two benchmark datasets. On average, our attack method is able to reduce the accuracy of MWP solvers by over 40 percentage points on these datasets. Our results demonstrate that existing MWP solvers are sensitive to linguistic variations in the problem text. We verify the validity and quality of generated adversarial examples through human evaluation. These results showcase that math word solvers do not generalize well and rely on superficial cues to achieve high performance. Next, we conduct experiments to showcase that this behaviour is mainly associated with the limited size and diversity present in existing MWP datasets. We modify the problem statements by altering the text in different settings such that either the problem statement does not make much sense or no question has been asked in the problem statement. The preliminary results from these analysis did not show significant drop in accuracy metric as was expected. Then, we propose several data augmentation techniques broadly categorized into Substitution and Paraphrasing based methods to mitigate the issues found by our analysis. By deploying data augmentation methods we increase the size of existing datasets by five folds. Extensive experiments on two benchmark datasets across three state-of-the-art MWP solvers show that the proposed methods increase the generalization and robustness of existing solvers. On average, the proposed methods significantly increase the state-of-the-art results by over five percentage points on benchmark datasets. Further, the solvers trained on the augmented dataset performs comparatively better on the challenge test set. We also show the effectiveness of proposed techniques through ablation studies and verify the quality of augmented samples through human evaluation Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |