Most existing studies on linear bandits focus on the one-dimensional characterization of the overall system. While being representative, this formulation may fail to model applications with high-dimensional but favorable structures, such as the low-rank tensor representation for recommender systems. To address this limitation, this work studies a general tensor bandits model, where actions and system parameters are represented by tensors as opposed to vectors, and we particularly focus on the case that the unknown system tensor is low-rank. A novel bandit algorithm, coined TOFU (Tensor Optimism in the Face of Uncertainty), is developed. TOFU first leverages flexible tensor regression techniques to estimate low-dimensional subspaces associated with the system tensor. These estimates are then utilized to convert the original problem to a new one with norm constraints on its system parameters. Lastly, a norm-constrained bandit subroutine is adopted by TOFU, which utilizes these constraints to avoid exploring the entire high-dimensional parameter space. Theoretical analyses show that TOFU improves the best-known regret upper bound by a multiplicative factor that grows exponentially in the system order. A novel performance lower bound is also established, which further corroborates the efficiency of TOFU.
翻译:摘要:现有线性赌博机研究大多聚焦于整体系统的一维刻画。尽管具有代表性,但这种公式化方法可能无法对存在高维但具备有利结构(如推荐系统中的低秩张量表示)的应用进行建模。为解决这一局限,本文研究了一个通用的张量赌博机模型,其中动作和系统参数以张量而非向量形式表示,我们特别关注未知系统张量为低秩的情形。我们提出了一种名为TOFU(基于不确定性的张量乐观估计)的新型赌博机算法。TOFU首先利用灵活的张量回归技术估计与系统张量相关的低维子空间,随后利用这些估计将原始问题转化为具有系统参数范数约束的新问题。最后,TOFU采用一种范数约束型赌博机子程序,利用这些约束避免对整个高维参数空间进行探索。理论分析表明,TOFU将已知最优遗憾上界提高了呈系统阶数指数增长的乘法因子。此外,本文还建立了新的性能下界,进一步验证了TOFU的高效性。