High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{1/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0\sqrt{T\log(d)} )$.
翻译:近年来,具有低维结构的高维线性赌博机因其实际意义而受到广泛关注。文献中最常见的结构是稀疏性。然而,在实践中稀疏性可能并不适用。对称性——即奖励在臂集合的某些变换群下保持不变——是高维情形下另一种重要的归纳偏置,它涵盖了许多标准结构,包括稀疏性。在本工作中,我们研究对称性对学习器隐藏的高维对称线性赌博机问题,需要在在线设置中学习正确的对称性。我们考察了隐藏对称性集合的结构,并提出了一种基于低维子空间集合内模型选择的方法。我们的算法实现了 $ O(d_0^{1/3} T^{2/3} \log(d))$ 的遗憾界,其中 $d$ 是可能非常大的环境维度,$d_0$ 是真实低维子空间的维度,满足 $d_0 \ll d$。在模型充分分离的额外假设下,我们可以进一步将遗憾改进为 $ O(d_0\sqrt{T\log(d)} )$。