We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n)$ high-probability regret, closing the gap to the Thompson sampling benchmark while keeping computation comparable. The proof brings a new perspective on randomized exploration in linear bandits by reducing the analysis to a time-uniform exceedance problem for $m$ independent Brownian motions. This continuous-time lens appears particularly natural here: it yields an exact representation of the relevant discrete-time processes, and we do not know another route to a sharp ES bound.
翻译:我们分析了在随机线性赌博机中具有标准高斯扰动的线性集成采样(ES)。研究表明,当集成规模 $m=Θ(d\log n)$ 时,ES 能以高概率实现 $\tilde O(d^{3/2}\sqrt n)$ 的遗憾值,从而缩小了与汤普森采样基准的差距,同时保持计算量相当。证明过程通过将分析简化为 $m$ 个独立布朗运动的时一致超额问题,为线性赌博机中的随机探索提供了新视角。这种连续时间视角在此处显得尤为自然:它能够精确表示相关离散时间过程,而据我们所知,目前尚无其他途径能推导出如此精确的 ES 界。