High-dimensional linear contextual bandit problems remain a significant challenge due to the curse of dimensionality. Existing methods typically consider either the model parameters to be sparse or the eigenvalues of context covariance matrices to be (approximately) sparse, lacking general applicability due to the rigidity of conventional reward estimators. To overcome this limitation, a powerful pointwise estimator is introduced in this work that adaptively navigates both kinds of sparsity. Based on this pointwise estimator, a novel algorithm, termed HOPE, is proposed. Theoretical analyses demonstrate that HOPE not only achieves improved regret bounds in previously discussed homogeneous settings (i.e., considering only one type of sparsity) but also, for the first time, efficiently handles two new challenging heterogeneous settings (i.e., considering a mixture of two types of sparsity), highlighting its flexibility and generality. Experiments corroborate the superiority of HOPE over existing methods across various scenarios.
翻译:高维线性上下文赌博机问题因维度灾难而持续面临重大挑战。现有方法通常假设模型参数稀疏或上下文协方差矩阵的特征值(近似)稀疏,由于传统奖励估计器的刚性,这些方法缺乏普适性。为克服这一限制,本文引入了一种强大的逐点估计器,能够自适应地导航两种稀疏性。基于此逐点估计器,我们提出了一种新算法,命名为HOPE。理论分析表明,HOPE不仅在先前讨论的同质设置(即仅考虑一种稀疏性)中实现了改进的遗憾界,而且首次高效处理了两种新的挑战性异质设置(即考虑两种稀疏性的混合),凸显了其灵活性与普适性。实验证实了HOPE在各种场景下优于现有方法的性能。