We study the multi-fidelity multi-armed bandit (MF-MAB), an extension of the canonical multi-armed bandit (MAB) problem. MF-MAB allows each arm to be pulled with different costs (fidelities) and observation accuracy. We study both the best arm identification with fixed confidence (BAI) and the regret minimization objectives. For BAI, we present (a) a cost complexity lower bound, (b) an algorithmic framework with two alternative fidelity selection procedures, and (c) both procedures' cost complexity upper bounds. From both cost complexity bounds of MF-MAB, one can recover the standard sample complexity bounds of the classic (single-fidelity) MAB. For regret minimization of MF-MAB, we propose a new regret definition, prove its problem-independent regret lower bound $\Omega(K^{1/3}\Lambda^{2/3})$ and problem-dependent lower bound $\Omega(K\log \Lambda)$, where $K$ is the number of arms and $\Lambda$ is the decision budget in terms of cost, and devise an elimination-based algorithm whose worst-cost regret upper bound matches its corresponding lower bound up to some logarithmic terms and, whose problem-dependent bound matches its corresponding lower bound in terms of $\Lambda$.
翻译:我们研究了多保真多臂老虎机(MF-MAB),这是经典多臂老虎机(MAB)问题的扩展。MF-MAB允许每个臂以不同的成本(保真度)和观测精度进行拉动。我们同时研究了固定置信度下的最优臂识别(BAI)和遗憾最小化目标。对于BAI,我们提出了(a)成本复杂度下界,(b)一个包含两种替代保真度选择过程的算法框架,以及(c)两个过程的成本复杂度上界。从MF-MAB的成本复杂度界限中,可以恢复经典(单保真度)MAB的标准样本复杂度界限。对于MF-MAB的遗憾最小化,我们定义了一个新的遗憾概念,证明了其与问题无关的遗憾下界$\Omega(K^{1/3}\Lambda^{2/3})$和与问题相关的下界$\Omega(K\log \Lambda)$,其中$K$是臂的数量,$\Lambda$是以成本度量的决策预算,并设计了一种基于消除的算法,其最差成本遗憾上界与相应下界相差对数项,而其与问题相关的界限在$\Lambda$方面与相应下界匹配。