We study the problem of best-arm identification (BAI) in the fixed-budget setting with heterogeneous reward variances. We propose two variance-adaptive BAI algorithms for this setting: SHVar for known reward variances and SHAdaVar for unknown reward variances. Our algorithms rely on non-uniform budget allocations among the arms where the arms with higher reward variances are pulled more often than those with lower variances. The main algorithmic novelty is in the design of SHAdaVar, which allocates budget greedily based on overestimating the unknown reward variances. We bound probabilities of misidentifying the best arms in both SHVar and SHAdaVar. Our analyses rely on novel lower bounds on the number of pulls of an arm that do not require closed-form solutions to the budget allocation problem. Since one of our budget allocation problems is analogous to the optimal experiment design with unknown variances, we believe that our results are of a broad interest. Our experiments validate our theory, and show that SHVar and SHAdaVar outperform algorithms from prior works with analytical guarantees.
翻译:本文研究固定预算场景下具有异质奖励方差的最优臂识别问题。我们针对该场景提出两种方差自适应的BAI算法:SHVar(适用于已知奖励方差)和SHAdaVar(适用于未知奖励方差)。这两种算法的核心在于对臂实施非均匀的预算分配——奖励方差较高的臂被拉动的频率高于方差较低的臂。主要算法创新体现在SHAdaVar的设计中,该算法通过高估未知奖励方差来实现贪婪式预算分配。我们对SHVar和SHAdaVar中错误识别最优臂的概率进行了上界分析,其理论推导依赖于一种无需闭式解的新型臂拉动次数下界。由于其中一个预算分配问题与未知方差下的最优实验设计问题在数学上具有同构性,我们认为本研究的结论具有广泛适用价值。实验验证了理论分析的正确性,并表明SHVar与SHAdaVar在性能上优于已有文献中具有解析保证的算法。