Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.
翻译:在许多优化和决策问题中都会出现对称性,并已引起优化领域的广泛关注:通过利用此类对称性,可显著改进最优解的搜索过程。尽管对称性在(离线)优化中取得了成功,但其在在线优化设置(尤其是赌博机文献)中的利用尚未得到充分研究。因此,本文研究了不变Lipschitz赌博机设定——这是Lipschitz赌博机的一个子类,其中奖励函数和臂集在变换群作用下保持不变。我们提出名为\texttt{UniformMesh-N}的算法,该算法通过群轨道将侧观测自然集成到\texttt{UniformMesh}算法(\cite{Kleinberg2005_UniformMesh})中,后者对臂集进行均匀离散化。利用侧观测方法,我们证明了一个改进的遗憾上界,该上界取决于群的基数(假设群为有限群)。我们还证明了不变Lipschitz赌博机类匹配的遗憾下界(忽略对数因子)。希望我们的工作能激发对赌博机理论及一般序贯决策理论中对称性的进一步研究。