Leveraging Human-Centered Explanations for Model Improvement and Evaluation

Author: Avani Gupta 2019121004
Date: 2023-09-30
Report no: IIIT/TH/2023/157
Advisor:P J Narayanan

Abstract

Neural Networks are known to be black box models which are explained by interpretability based approaches. ML Interpretability methods to explain the complex Neural Networks reasoning in human understandable form. Humans think in abstract concepts like color, texture, shapes, etc. Explaining black box models in these simple concepts aids human understanding of models leading to more transparency, reliability and proactive identification of risks/biases in the models. The recent interpretability based methods have started using concepts for explaining complex models in simple human understandable terms. In computer vision, concepts are defined as a set of images having a human-understandable meaning associated with them (eg. striped images form stripiness concept). Current concept based interpretability methods use the encoding of concepts in an intermediate layer of the model to define concept representations. They further use these representations by perturbing model activations in the intermediate layer and gauging its sensitivity to the perturbations. We use these CAV based sensitivity calculations for evaluation of desired properties by using the very definitions from the problem as concepts. We further extend the use of CAVs from post-hoc analysis to ante-hoc training via our novel concept loss and distillation paradigm. Concretely, we present a concept sensitivity based method to measure disentanglement in an ill-posed ambiguous problem of Intrinsic Image Decomposition (IID) followed by a novel method for concept based training of DNNs which utilizes conceptual knowledge from large pre-trained models in a distillation paradigm. IID involves decomposing an image into its constituent Reflectance and Shading components, which are illumination-invariant and albedo-invariant, respectively. We use this definition of IID to define our concepts and propose to use the sensitivity scores to directly measure the alignment of models predictions with the definition. For this, we measure the illumination invariance of Reflectance prediction and albedo invariance of Shading prediction by gazing model’s sensitivity to relevant concepts. We thus define the evaluation of IID in abstract human centered concepts. We define Concept Sensitivity Metric (CSM) to measure the disentanglement of Reflectance Shading in the models predictions for evaluating IID methods. We evaluate and interpret three recent IID methods on our synthetic benchmark of controlled albedo and illumination invariance sets. We also compare our metric with existing IID evaluation metrics on both natural and synthetic scenes and report our observations. Our metric not only surpasses several limitations of the existing metrics but is also consistent in both synthetic and real-world datasets. The concept based interpretability methods are post-hoc (after training) and can be used for analyzing the models. We aim to use the feedback provided by these post-hoc methods to train the model for further improvements in an ante-hoc manner. For this we propose a novel concept loss based on the CAV sensitivity. We argue that CAV learning in same model is not efficient and propose to use a separate model having knowledge of concepts in a knowledge distillation paradigm. We present Concept Distillation: a novel method for concept sensitive training of Deep Neural Networks. Concept Distillation can be used to sensitize or desensitize the student model towards user desired concepts. We show applications of our concept-sensitivity based training in debiasing in classification problems and prior induction in IID. We also introduce the TextureMNIST dataset to evaluate the presence of complex texture biases. Our concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge.

Full thesis: pdf

Centre for Visual Information Technology

IIIT Hyderabad Publications

Leveraging Human-Centered Explanations for Model Improvement and Evaluation

Abstract