We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. Intriguingly, this continuous-time lens is not forced; it appears natural--and perhaps necessary: the discrete-time problem seems to be asking for a continuous-time solution, and we know of no other way to obtain a sharp ES bound.
翻译:本文分析了随机线性赌博机中采用标准高斯扰动的线性集成采样(ES)方法。我们证明当集成规模$m=Θ(d\log n)$时,ES能以高概率实现$\tilde O(d^{3/2}\sqrt n)$的遗憾值,在保持计算复杂度相当的同时,缩小了与汤普森采样基准的差距。该证明通过将分析简化为$m$个独立布朗运动的时一致超越问题,为线性赌博机中的随机探索提供了新视角。值得注意的是,这种连续时间视角并非刻意引入,而是自然显现——或许具有必然性:离散时间问题本身就在呼唤连续时间解,据我们所知,这是获得精确ES界限的唯一途径。