In this research, we investigate the high-dimensional linear contextual bandit problem where the number of features $p$ is greater than the budget $T$, or it may even be infinite. Differing from the majority of previous works in this field, we do not impose sparsity on the regression coefficients. Instead, we rely on recent findings on overparameterized models, which enables us to analyze the performance the minimum-norm interpolating estimator when data distributions have small effective ranks. We propose an explore-then-commit (EtC) algorithm to address this problem and examine its performance. Through our analysis, we derive the optimal rate of the ETC algorithm in terms of $T$ and show that this rate can be achieved by balancing exploration and exploitation. Moreover, we introduce an adaptive explore-then-commit (AEtC) algorithm that adaptively finds the optimal balance. We assess the performance of the proposed algorithms through a series of simulations.
翻译:本研究探讨了高维线性上下文赌博机问题,其中特征数 $p$ 大于预算 $T$,甚至可能为无穷大。与以往多数聚焦于回归系数稀疏性的研究不同,本文不依赖稀疏性假设,而是基于过参数化模型的最新研究成果,在数据分布具有较小有效秩的条件下,分析了最小范数插值估计量的性能。我们提出了一种先探索后承诺(EtC)算法以解决该问题,并评估其表现。通过理论分析,我们推导出EtC算法关于$T$的最优速率,并证明该速率可通过平衡探索与利用实现。此外,我们引入自适应先探索后承诺(AEtC)算法,该算法能够自适应地找到最优平衡点。最后,通过一系列仿真实验对所提算法的性能进行了评估。