In this paper, we aim to build a novel bandits algorithm that is capable of fully harnessing the power of multi-dimensional data and the inherent non-linearity of reward functions to provide high-usable and accountable decision-making services. To this end, we introduce a generalized low-rank tensor contextual bandits model in which an action is formed from three feature vectors, and thus can be represented by a tensor. In this formulation, the reward is determined through a generalized linear function applied to the inner product of the action's feature tensor and a fixed but unknown parameter tensor with a low tubal rank. To effectively achieve the trade-off between exploration and exploitation, we introduce a novel algorithm called "Generalized Low-Rank Tensor Exploration Subspace then Refine" (G-LowTESTR). This algorithm first collects raw data to explore the intrinsic low-rank tensor subspace information embedded in the decision-making scenario, and then converts the original problem into an almost lower-dimensional generalized linear contextual bandits problem. Rigorous theoretical analysis shows that the regret bound of G-LowTESTR is superior to those in vectorization and matricization cases. We conduct a series of simulations and real data experiments to further highlight the effectiveness of G-LowTESTR, leveraging its ability to capitalize on the low-rank tensor structure for enhanced learning.
翻译:本文旨在构建一种新型赌博机算法,能够充分利用多维数据的潜力以及奖励函数固有的非线性特性,提供高可用性且可问责的决策服务。为此,我们引入了一种广义低秩张量上下文赌博机模型,其中动作由三个特征向量构成,因此可用张量表示。在该模型中,奖励通过一个广义线性函数确定,该函数作用于动作的特征张量与一个固定但未知的、具有低管秩的参数张量的内积。为有效实现探索与利用的权衡,我们提出了一种名为"广义低秩张量探索子空间再精炼"(G-LowTESTR)的新算法。该算法首先收集原始数据,探索决策场景中固有的低秩张量子空间信息,然后将原始问题转化为一个近似降维的广义线性上下文赌博机问题。严格的理论分析表明,G-LowTESTR的遗憾界优于向量化和矩阵化情形。我们通过一系列仿真实验和真实数据实验进一步凸显了G-LowTESTR的有效性,该算法能够利用低秩张量结构实现增强学习。