IIIT Hyderabad Publications |
|||||||||
|
Making Deep Models Generalizable using Domain AdaptationAuthor: Abhay Rawat Date: 2022-11-21 Report no: IIIT/TH/2022/163 Advisor:Kamalakar Karlapalem,Rahul Tallamraju AbstractSupervised methods in deep learning approaches hinge on the assumption that the training and testing data are sampled from the same distribution. However, this is not the case in most realistic scenarios and thus leads to poor performance when these models are deployed in domains with different data distribution than the training set. Unsupervised Domain Adaptation (UDA) tackles the problem of adapting the data distributions between a labelled source domain and an unlabeled target domain. In contrast, Semi-supervised Domain Adaptation (SSDA) assumes a partially labelled target domain, a more realistic scenario in many computer vision tasks. Domain randomization is a popular approach wherein models are trained on synthetically generated data. With complete control over the synthetic data generation process, domain randomization introduces randomness in various properties of the object as well as the scene. Similar to data augmentation techniques in deep learning, the hope here is to introduce invariance for different non-causal features of the data and even nudge the model towards learning the causal correlations for the task at hand. In this thesis, we explore and study various approaches to domain adaptation. First, we present the image level domain adaptation methods, which use image-level manipulation or transformations to achieve domain invariance. We first analyze the domain randomization approach used in an object detection setting. For this, we use synthetically generated data and train a FasterRCNN model aimed at the object detection task. Domain randomization helps boost the performance of object detection models, and a model trained entirely on synthetic data outperforms the one trained on real data. With fine-tuning, the performance of the model trained on synthetically generated data increases drastically. Next, we extend the work on domain adaptation in the frequency domain, wherein the image level adaptation occurs in the frequency domain. To this end, we propose new combination strategies to combine the frequency components. We propose masking techniques which consider the frequency of the components in the combination process. Fourier domain adaptation techniques have seen some success in the image segmentation task from synthetic like GTA5 [1] and SYNTHIA [2] to realistic domains like Cityscapes [3]; however, these domains contain syntactically similar images. For the synthetic dataset that we use, we find that these frequency domain bases stylization methods do not improve performance over the domain randomization method. Finally, we present two novel methods for domain adaptation using feature level alignment. One of the primary challenges in SSDA is the skewed ratio between the number of labelled source and target samples, causing the model to be biased towards the source domain. Recent works in SSDA show that aligning only the labelled target samples with the source samples potentially leads to incomplete domain alignment of the target domain to the source domain. In our first approach, we train the source and target feature spaces separately. To ensure that the feature space of the target domain is generalized well, we employ semi-supervised methods to leverage the labelled and unlabeled samples. The Domain Adapters, which are parametric functions, are then trained to learn the feature level transformation from the target domain to the source domain. During inference, we use the target domain’s feature extractor and then pass the features to the respective Domain Adapter for that target-source pair. The transformed feature representation in the source domain is then fed to the source classifier. We show that keeping the feature extractors separate is advantageous if the domain gap between the source and the target domain is insignificant. Finally, we present SPI, which leverages contrastive losses to learn a semantically meaningful and a domain agnostic feature space using the supervised samples from both domains. To mitigate challenges caused by the skewed label ratio, we pseudo-label the unlabeled target samples by comparing their feature representation to those of the labelled samples from both the source and target domains. Furthermore, to increase the support of the target domain, these potentially noisy pseudo-labels are gradually injected into the labelled target dataset over the course of training. Specifically, we use a temperature scaled cosine similarity measure to assign a soft pseudo-label to the unlabeled target samples. Additionally, we compute an exponential moving average of the soft pseudo-labels for each unlabeled sample. These pseudo-labels are progressively injected (or removed) into (from) the labelled target dataset based on a confidence threshold to supplement the alignment of the source and target distributions. Finally, we use a supervised contrastive loss on the labelled and pseudo-labelled datasets to align the source and target distributions. Using our proposed approach, we showcase state-of-the-art performance on semi-supervised domain adaptation benchmark datasets. Full thesis: pdf Centre for Data Engineering |
||||||||
Copyright © 2009 - IIIT Hyderabad. All Rights Reserved. |