We study a multi-objective pure exploration problem in a multi-armed bandit model. Each arm is associated to an unknown multi-variate distribution and the goal is to identify the distributions whose mean is not uniformly worse than that of another distribution: the Pareto optimal set. We propose and analyze the first algorithms for the \emph{fixed budget} Pareto Set Identification task. We propose Empirical Gap Elimination, a family of algorithms combining a careful estimation of the ``hardness to classify'' each arm in or out of the Pareto set with a generic elimination scheme. We prove that two particular instances, EGE-SR and EGE-SH, have a probability of error that decays exponentially fast with the budget, with an exponent supported by an information theoretic lower-bound. We complement these findings with an empirical study using real-world and synthetic datasets, which showcase the good performance of our algorithms.
翻译:我们研究了多臂老虎机模型中的多目标纯探索问题。每个臂与未知的多变量分布相关联,目标是在这些分布中识别均值非全面劣于其他分布的分布,即Pareto最优集。我们提出并分析了首个针对固定预算Pareto集辨识任务的算法。我们提出了经验缺口消除算法族,该族算法巧妙结合了对每个臂是否属于Pareto集的"分类难度"的精确估计与通用消除方案。我们证明了两个特例(EGE-SR和EGE-SH)的误差概率随预算呈指数级衰减,其指数下界由信息论给出。我们通过真实数据集与合成数据集的实证研究进一步支撑了理论发现,展示了所提算法的优良性能。