Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, \textit{simultaneously} proposing a batch of decisions, which leverages available resources for parallel experimentation, has the potential to rapidly accelerate exploration. We present a family of (parallel) contextual bandit algorithms applicable to problems with bounded eluder dimension whose regret is nearly identical to their perfectly sequential counterparts -- given access to the same total number of oracle queries -- up to a lower-order ``burn-in" term. We further show these algorithms can be specialized to the class of linear reward functions where we introduce and analyze several new linear bandit algorithms which explicitly introduce diversity into their action selection. Finally, we also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems, to demonstrate the utility of parallelized bandits in practical settings.
翻译:标准的不确定性决策方法侧重于决策空间的顺序探索。然而,利用可用资源进行并行实验的批量决策方案,具有快速加速探索的潜力。我们提出了一系列(并行)上下文强盗算法,适用于有界模糊维度问题,其遗憾值在给定相同总查询次数的情况下——除低阶“预热”项外——几乎与完美顺序对应算法相同。我们进一步证明这些算法可特化为线性奖励函数类,在此类中我们引入并分析了多种明确引入行动选择多样性的新型线性强盗算法。最后,我们在材料发现与生物序列设计等多个领域对这些并行算法进行了实证评估,证明了并行化强盗算法在实践中的效用。