We consider a variant of the best arm identification task in stochastic multi-armed bandits. Motivated by risk-averse decision-making problems, our goal is to identify a set of $m$ arms with the highest $\tau$-quantile values within a fixed budget. We prove asymmetric two-sided concentration inequalities for order statistics and quantiles of random variables that have non-decreasing hazard rate, which may be of independent interest. With these inequalities, we analyse a quantile version of Successive Accepts and Rejects (Q-SAR). We derive an upper bound for the probability of arm misidentification, the first justification of a quantile based algorithm for fixed budget multiple best arms identification. We show illustrative experiments for best arm identification.
翻译:我们考虑随机多臂赌博机中最佳臂识别任务的一个变体。受风险厌恶型决策问题的启发,我们的目标是在固定预算内,识别出具有最高$\tau$-分位数值的$m$个臂的集合。我们证明了对于具有非递减风险率的随机变量的顺序统计量和分位数的不对称双边集中不等式,这一结果可能具有独立的研究价值。利用这些不等式,我们分析了一个分位数版本的逐次接受与拒绝算法(Q-SAR)。我们推导出臂误识别概率的上界,这是对基于分位数的固定预算多臂最佳识别算法的首次理论证明。我们通过实验展示了该算法在最佳臂识别中的效果。