In this work, we close the fundamental gap of theory and practice by providing an improved regret bound for linear ensemble sampling. We prove that with an ensemble size logarithmic in $T$, linear ensemble sampling can achieve a frequentist regret bound of $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$, matching state-of-the-art results for randomized linear bandit algorithms, where $d$ and $T$ are the dimension of the parameter and the time horizon respectively. Our approach introduces a general regret analysis framework for linear bandit algorithms. Additionally, we reveal a significant relationship between linear ensemble sampling and Linear Perturbed-History Exploration (LinPHE), showing that LinPHE is a special case of linear ensemble sampling when the ensemble size equals $T$. This insight allows us to derive a new regret bound of $\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$ for LinPHE, independent of the number of arms. Our contributions advance the theoretical foundation of ensemble sampling, bringing its regret bounds in line with the best known bounds for other randomized exploration algorithms.
翻译:在本工作中,我们通过为线性集成采样提供改进的遗憾界,弥合了理论与应用之间的根本差距。我们证明,当集成规模为$T$的对数级时,线性集成采样能够实现$\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$的频次主义遗憾界,这与随机化线性赌博机算法的最先进结果相匹配,其中$d$和$T$分别表示参数维度和时间跨度。我们的方法引入了一个适用于线性赌博机算法的通用遗憾分析框架。此外,我们揭示了线性集成采样与线性扰动历史探索(LinPHE)之间的重要联系,证明当集成规模等于$T$时,LinPHE是线性集成采样的一个特例。这一洞见使我们能够为LinPHE推导出独立于臂数量的新遗憾界$\tilde{\mathcal{O}}(d^{3/2}\sqrt{T})$。我们的贡献推进了集成采样的理论基础,使其遗憾界与其他随机化探索算法的最佳已知界保持一致。