Best arm identification (BAI) aims to identify the highest-performance arm among a set of $K$ arms by collecting stochastic samples from each arm. In real-world problems, the best arm needs to satisfy additional feasibility constraints. While there is limited prior work on BAI with feasibility constraints, they typically assume the performance and constraints are observed simultaneously on each pull of an arm. However, this assumption does not reflect most practical use cases, e.g., in drug discovery, we wish to find the most potent drug whose toxicity and solubility are below certain safety thresholds. These safety experiments can be conducted separately from the potency measurement. Thus, this requires designing BAI algorithms that not only decide which arm to pull but also decide whether to test for the arm's performance or feasibility. In this work, we study feasible BAI which allows a decision-maker to choose a tuple $(i,\ell)$, where $i\in [K]$ denotes an arm and $\ell$ denotes whether she wishes to test for its performance ($\ell=0$) or any of its $N$ feasibility constraints ($\ell\in[N]$). We focus on the fixed confidence setting, which is to identify the feasible arm with the highest performance, with a probability of at least $1-δ$. We propose an efficient algorithm and upper-bound its sample complexity, showing our algorithm can naturally adapt to the problem's difficulty and eliminate arms by worse performance or infeasibility, whichever is easier. We complement this upper bound with a lower bound showing that our algorithm is \textit{asymptotically ($δ\rightarrow 0$) optimal}. Finally, we empirically show that our algorithm outperforms other state-of-the-art BAI algorithms in both synthetic and real-world datasets.
翻译:最优臂识别(BAI)旨在通过从一组$K$个臂中收集随机样本,识别出其中性能最高的臂。在现实问题中,最优臂还需满足额外的可行性约束。尽管已有少量关于带可行性约束的BAI的研究,但它们通常假设每次拉动一个臂时,其性能和约束条件可同时被观测。然而,该假设与大多数实际应用场景不符。例如,在药物发现中,我们希望找到药效最强且毒性和溶解度低于特定安全阈值的药物。这些安全性实验可与药效测量分开进行。因此,这需要设计不仅能决定拉动哪个臂,还能决定是测试臂的性能还是其可行性的BAI算法。在本工作中,我们研究可行BAI问题,允许决策者选择元组$(i,\ell)$,其中$i\in [K]$表示臂,$\ell$表示她希望测试其性能($\ell=0$)还是其$N$个可行性约束中的任意一个($\ell\in[N]$)。我们关注固定置信度设置,即以至少$1-δ$的概率识别出性能最高的可行臂。我们提出了一种高效算法,并给出了其样本复杂度的上界,表明我们的算法能自然地适应问题的难度,并通过较差的性能或不可行性(以较易满足的条件为准)来淘汰臂。我们进一步给出了匹配的下界,证明我们的算法是\textit{渐近($δ\rightarrow 0$)最优的}。最后,我们通过实验证明,在合成和真实数据集上,我们的算法均优于其他最先进的BAI算法。