Learning With Bandit Feedback

Author: Mudit Agrawal
Date: 2022-06-21
Report no: IIIT/TH/2022/72
Advisor:Naresh Manwani

Abstract

As we all know, training a highly effective online learning model for complex tasks often hinges on the abundance of high-quality noise-free labeled data. However, acquiring this high-quality, noise-free labeled data is becoming a bottleneck in cost, time, and computational resources. In this thesis, we make a sincere effort to address two significant issues that the current State-Of-TheArt bandit feedback-based online learning algorithms fail to address: (a) noise present in bandit feedback and (b) algorithms’ heavy reliance on labeled data. To deal with the noise present in the bandit feedback, we proposed a novel algorithm named RCINE, robust to noisy bandit feedback. The methodology used in RCINE requires knowledge of the noise rates. We proposed a subroutine called NREst to estimate the noise rates, resulting in an end-to-end learning framework for learning a multiclass classifier under noisy bandit feedback. The proposed algorithm enjoys a mistake bound of the order of O( √ T) in the high noise case and of the order of O(T 2/3 ) in the worst case. We also show our approach’s effectiveness using extensive experiments on several benchmark datasets. Furthermore, to reduce the reliance of an online supervised learning algorithm on labeled data, we proposed ALBIF, an efficient stochastic sub-gradient descent algorithm for learning a multiclass classifier under an active bandit feedback setting. ALBIF enjoys a regret bound of the order O(log T) in the active learning setting as well as in the standard (non-active) bandit feedback setting. We also demonstrate the effectiveness of the proposed algorithms by conducting extensive experiments on various real-world and synthetic datasets against several benchmark algorithms.

Full thesis: pdf

Centre for Machine Learning Lab

IIIT Hyderabad Publications

Learning With Bandit Feedback

Abstract