We provide the first useful and rigorous analysis of ensemble sampling for the stochastic linear bandit setting. In particular, we show that, under standard assumptions, for a $d$-dimensional stochastic linear bandit with an interaction horizon $T$, ensemble sampling with an ensemble of size of order $\smash{d \log T}$ incurs regret at most of the order $\smash{(d \log T)^{5/2} \sqrt{T}}$. Ours is the first result in any structured setting not to require the size of the ensemble to scale linearly with $T$ -- which defeats the purpose of ensemble sampling -- while obtaining near $\smash{\sqrt{T}}$ order regret. Ours is also the first result that allows infinite action sets.
翻译:我们首次对随机线性赌博机设置下的集成采样进行了有用且严格的分析。具体而言,我们证明,在标准假设下,对于一个具有 $d$ 维特征空间和交互时长 $T$ 的随机线性赌博机,使用规模约为 $\smash{d \log T}$ 的集成进行采样,其遗憾值最多约为 $\smash{(d \log T)^{5/2} \sqrt{T}}$。这是在任意结构化问题中,首个无需集成规模随 $T$ 线性增长——否则会违背集成采样的初衷——且能获得接近 $\smash{\sqrt{T}}$ 量级遗憾的结果。同时,我们的结果也是首个允许无限动作集的结果。