This work addresses a version of the two-armed Bernoulli bandit problem where the sum of the means of the arms is one (the symmetric two-armed Bernoulli bandit). In a regime where the gap between these means goes to zero and the number of prediction periods approaches infinity, we obtain the leading order terms of the minmax optimal regret and pseudoregret for this problem by associating each of them with a solution of a linear heat equation. Our results improve upon the previously known results; specifically, we explicitly compute these leading order terms in three different scaling regimes for the gap. Additionally, we obtain new non-asymptotic bounds for any given time horizon.
翻译:本文研究了双臂伯努利赌博机问题的一个变体,其中两臂均值之和为1(对称双臂伯努利赌博机)。在臂间差距趋于零且预测周期数趋近无穷的框架下,我们通过将最小最大最优遗憾与伪遗憾分别与线性热方程的解相关联,得到了该问题的前导阶项。所得结果改进了已有成果:具体而言,我们在三种不同的差距缩放区间内明确计算了这些前导阶项。此外,对于任意给定的时间范围,我们还获得了新的非渐近界。