We investigate the \emph{linear contextual bandit problem} with independent and identically distributed (i.i.d.) contexts. In this problem, we aim to develop a \emph{Best-of-Both-Worlds} (BoBW) algorithm with regret upper bounds in both stochastic and adversarial regimes. We develop an algorithm based on \emph{Follow-The-Regularized-Leader} (FTRL) with Tsallis entropy, referred to as the $\alpha$-\emph{Linear-Contextual (LC)-Tsallis-INF}. We show that its regret is at most $O(\log(T))$ in the stochastic regime under the assumption that the suboptimality gap is uniformly bounded from below, and at most $O(\sqrt{T})$ in the adversarial regime. Furthermore, our regret analysis is extended to more general regimes characterized by the \emph{margin condition} with a parameter $\beta \in (1, \infty]$, which imposes a milder assumption on the suboptimality gap. We show that the proposed algorithm achieves $O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$ regret under the margin condition.
翻译:我们研究了具有独立同分布(i.i.d.)上下文的线性上下文赌博机问题。在该问题中,我们的目标是开发一种在随机和对抗两种机制下均具有遗憾上界的“两全其美”(BoBW)算法。我们基于采用Tsallis熵的“跟随正则化领导者”(FTRL)框架,提出了一种算法,称为$\alpha$-线性上下文(LC)-Tsallis-INF。我们证明,在次优间隙一致有下界的假设下,该算法在随机机制下的遗憾至多为$O(\log(T))$,在对抗机制下的遗憾至多为$O(\sqrt{T})$。此外,我们的遗憾分析被推广到由参数$\beta \in (1, \infty]$刻画的、对次优间隙施加更弱假设的“边界条件”所表征的更一般机制。我们证明了所提算法在边界条件下实现了$O\left(\log(T)^{\frac{1+\beta}{2+\beta}}T^{\frac{1}{2+\beta}}\right)$的遗憾。