In this note, we give a new lower bound for the $\gamma$-regret in bandit problems, the regret which arises when comparing against a benchmark that is $\gamma$ times the optimal solution, i.e., $\mathsf{Reg}_{\gamma}(T) = \sum_{t = 1}^T \gamma \max_{\pi} f(\pi) - f(\pi_t)$. The $\gamma$-regret arises in structured bandit problems where finding an exact optimum of $f$ is intractable. Our lower bound is given in terms of a modification of the constrained Decision-Estimation Coefficient (DEC) of~\citet{foster2023tight} (and closely related to the original offset DEC of \citet{foster2021statistical}), which we term the $\gamma$-DEC. When restricted to the traditional regret setting where $\gamma = 1$, our result removes the logarithmic factors in the lower bound of \citet{foster2023tight}.
翻译:本文给出了赌博机问题中$\gamma$-遗憾的一个新下界,该遗憾是当与$\gamma$倍最优解基准进行比较时产生的,即$\mathsf{Reg}_{\gamma}(T) = \sum_{t = 1}^T \gamma \max_{\pi} f(\pi) - f(\pi_t)$。$\gamma$-遗憾出现在结构化赌博机问题中,此时寻找$f$的精确最优解是难以处理的。我们的下界通过对约束决策-估计系数(DEC,\citet{foster2023tight}提出,且与\citet{foster2021statistical}提出的原始偏移DEC密切相关)进行修正得到,我们将该修正项称为$\gamma$-DEC。当限制在$\gamma = 1$的传统遗憾设定时,我们的结果去除了\citet{foster2023tight}下界中的对数因子。