We propose feature perturbation, a simple yet powerful technique that injects randomness directly into feature inputs, instead of randomizing unknown parameters or adding noise to rewards. Remarkably, this algorithm achieves $\tilde{\mathcal{O}}(d\sqrt{T})$ worst-case regret bound for generalized linear bandits, while avoiding the $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ regret typical of existing randomized bandit algorithms. Because our algorithm eschews parameter sampling, it is both computationally efficient and naturally extends to non-parametric or neural network models. We verify these advantages through empirical evaluations, demonstrating that feature perturbation not only surpasses existing methods but also unifies strong practical performance with best-known theoretical guarantees.
翻译:我们提出特征扰动这一简单而强大的技术,该方法将随机性直接注入特征输入,而非对未知参数进行随机化或向奖励添加噪声。值得注意的是,该算法在广义线性赌博机中实现了$\tilde{\mathcal{O}}(d\sqrt{T})$的最坏情况遗憾界,同时避免了现有随机化赌博机算法典型的$\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$遗憾。由于我们的算法避免了参数采样,其不仅计算高效,还能自然扩展到非参数或神经网络模型。我们通过实证评估验证了这些优势,证明特征扰动不仅超越了现有方法,还将强大的实际性能与最优理论保证相统一。