Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.
翻译:对称性出现在许多优化和决策问题中,并引起了优化领域的广泛关注:利用此类对称性的存在,可以显著改进最优解的搜索过程。尽管对称性在(离线)优化中取得了成功,但其在在线优化设置(尤其是赌博机文献)中的应用尚未得到充分研究。因此,本文研究不变Lipschitz赌博机设定——这是Lipschitz赌博机的一个子类,其中奖励函数和臂集在一组变换下保持不变。我们提出了一种名为\texttt{UniformMesh-N}的算法,该算法通过群轨道自然地将旁观测集成到\texttt{UniformMesh}算法(\cite{Kleinberg2005_UniformMesh})中,后者对臂集进行均匀离散化。利用旁观测方法,我们证明了一个改进的遗憾上界,该界依赖于群的基数(假设群是有限的)。我们还证明了不变Lipschitz赌博机类(至多对数因子)的匹配遗憾下界。我们希望我们的工作能够激发对赌博机理论及更广泛序贯决策理论中对称性的进一步研究。