Pandora's Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate distributions are given for the values of all the alternatives, while recent work studies the online variant of Pandora's Box where the distributions are originally unknown. In this work, we study Pandora's Box in the online setting, while incorporating context. At every round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well to the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored. The key technique that enables our result is a novel modification of the realizability condition in contextual bandits that connects a context to a sufficient statistic of each alternative's distribution (its "reservation value") rather than its mean.
翻译:潘多拉之盒是一个基础性的随机优化问题,其中决策者需在最小化探索每个备选项价值的搜索成本的同时,找到优良的备选方案。在原始问题设定中,所有备选项价值的精确分布被假定为已知,而近期研究则关注分布最初未知的在线变体。本研究探讨了在线场景下融入情境信息的潘多拉之盒问题。在每个时间轮次,我们面对若干备选项,每个备选项都带有情境信息、探索成本以及由未知分布生成且可能随轮次变化的未知价值。我们的主要成果是一个无遗憾算法,其表现可与完全知晓所有先验分布的最优算法相媲美。即使在该算法从未获知未探索备选项价值的赌博机设定下,该算法依然有效。实现这一成果的关键技术是对情境赌博机中可实现性条件的新颖修正——该修正将情境与各备选项分布的充分统计量(其"保留价值")而非均值建立联系。