In this paper we revisit the fixed-confidence identification of the Pareto optimal set in a multi-objective multi-armed bandit model. As the sample complexity to identify the exact Pareto set can be very large, a relaxation allowing to output some additional near-optimal arms has been studied. In this work we also tackle alternative relaxations that allow instead to identify a relevant subset of the Pareto set. Notably, we propose a single sampling strategy, called Adaptive Pareto Exploration, that can be used in conjunction with different stopping rules to take into account different relaxations of the Pareto Set Identification problem. We analyze the sample complexity of these different combinations, quantifying in particular the reduction in sample complexity that occurs when one seeks to identify at most $k$ Pareto optimal arms. We showcase the good practical performance of Adaptive Pareto Exploration on a real-world scenario, in which we adaptively explore several vaccination strategies against Covid-19 in order to find the optimal ones when multiple immunogenicity criteria are taken into account.
翻译:本文重新审视了多目标多臂赌博机模型中帕累托最优集的固定置信度识别问题。由于识别精确帕累托集的样本复杂度可能非常高,已有研究通过允许输出部分近似最优臂的松弛方法来解决该问题。本研究同时探讨了另一种替代性松弛方法,该方法允许识别帕累托集的相关子集。特别地,我们提出了一种名为自适应帕累托探索的单一采样策略,该策略可结合不同停止规则使用,以处理帕累托集识别问题的多种松弛形式。我们分析了这些不同组合的样本复杂度,定量刻画了当目标仅需识别至多$k$个帕累托最优臂时样本复杂度的降低程度。通过实际场景验证了自适应帕累托探索的良好性能——在该场景中,我们自适应探索针对新冠疫情的多种疫苗接种策略,以在考虑多重免疫原性指标时寻找最优方案。