We propose feature perturbation, a simple yet effective exploration strategy for contextual bandits that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\tilde{\mathcal{O}}(d\sqrt{T})$ worst-case regret bound for generalized linear contextual bandits, while avoiding the $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with the near-optimal regret guarantees.
翻译:我们提出特征扰动,一种简单而有效的上下文赌博机探索策略,该方法将随机性直接注入特征输入,而非随机化未知参数或向奖励添加噪声。值得注意的是,该算法在广义线性上下文赌博机中实现了$\tilde{\mathcal{O}}(d\sqrt{T})$的最坏情况遗憾界,同时避免了现有随机化赌博机算法典型的$\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$遗憾。由于我们的算法避免了参数采样,它既具有计算高效性,又能自然地扩展到非参数或神经网络模型。我们通过实证评估验证了这些优势,表明特征扰动不仅超越了现有方法,而且将强大的实际性能与接近最优的遗憾保证统一起来。