Variance-Optimal Arm Selection: Misallocation Minimization and Best Arm Identification

This paper focuses on selecting the arm with the highest variance from a set of $K$ independent arms. Specifically, we focus on two settings: (i) misallocation minimization setting, that penalizes the number of pulls of suboptimal arms in terms of variance, and (ii) fixed-budget best arm identification setting, that evaluates the ability of an algorithm to determine the arm with the highest variance after a fixed number of pulls. We develop a novel online algorithm called UCB-VV for the misallocation minimization (MM) and show that its upper bound on misallocation for bounded rewards evolves as $\mathcal{O}\left(\log{n}\right)$ where $n$ is the horizon. By deriving the lower bound on the misallocation, we show that UCB-VV is order optimal. For the fixed budget best arm identification (BAI) setting we propose the SHVV algorithm. We show that the upper bound of the error probability of SHVV evolves as $\exp\left(-\frac{n}{\log(K) H}\right)$, where $H$ represents the complexity of the problem, and this rate matches the corresponding lower bound. We extend the framework from bounded distributions to sub-Gaussian distributions using a novel concentration inequality on the sample variance and standard deviation. Leveraging the same, we derive a concentration inequality for the empirical Sharpe ratio (SR) for sub-Gaussian distributions, which was previously unknown in the literature. Empirical simulations show that UCB-VV consistently outperforms $ε$-greedy across different sub-optimality gaps though it is surpassed by VTS, which exhibits the lowest misallocation, albeit lacking in theoretical guarantees. We also illustrate the superior performance of SHVV, for a fixed budget setting under 6 different setups against uniform sampling. Finally, we conduct a case study to empirically evaluate the performance of the UCB-VV and SHVV in call option trading on $100$ stocks generated using GBM.

翻译：本文聚焦于从一组 $K$ 个独立臂中选择具有最高方差的臂。具体而言，我们关注两种设定：(i) 误分配最小化设定，该设定对方差意义上的次优臂的拉动次数进行惩罚；(ii) 固定预算最优臂识别设定，该设定评估算法在经过固定次数的拉动后确定具有最高方差的臂的能力。我们为误分配最小化问题开发了一种名为 UCB-VV 的新型在线算法，并证明了其对于有界奖励的误分配上界以 $\mathcal{O}\left(\log{n}\right)$ 的速率演进，其中 $n$ 是时间范围。通过推导误分配的下界，我们证明了 UCB-VV 是阶数最优的。对于固定预算最优臂识别设定，我们提出了 SHVV 算法。我们证明了 SHVV 的错误概率上界以 $\exp\left(-\frac{n}{\log(K) H}\right)$ 的速率演进，其中 $H$ 代表问题的复杂度，并且该速率与相应的下界相匹配。我们利用一种关于样本方差和标准差的新型集中不等式，将框架从有界分布扩展到亚高斯分布。基于此，我们推导出了亚高斯分布下经验夏普率的集中不等式，这在此前的文献中是未知的。实证模拟表明，尽管被 VTS 超越（VTS 表现出最低的误分配，但缺乏理论保证），UCB-VV 在不同的次优差距下始终优于 $ε$-greedy 算法。我们还展示了在 6 种不同设置下，固定预算设定中 SHVV 相对于均匀采样的优越性能。最后，我们进行了一项案例研究，使用 GBM 生成的 $100$ 只股票在期权交易中实证评估了 UCB-VV 和 SHVV 的性能。