In the secretary problem, a set of secretary candidates arrive in a uniformly random order and reveal their values one by one. A company, who can only hire one candidate and hopes to maximize the expected value of its hire, needs to make irrevocable online decisions about whether to hire the current candidate. The classical framework of evaluating a policy is to compute its worst-case competitive ratio against the optimal solution in hindsight, and there the best policy -- the ``$1/e$ law'' -- has a competitive ratio of $1/e$. We propose an alternative evaluation framework through the lens of regret -- the worst-case additive difference between the optimal hindsight solution and the expected performance of the policy, assuming that each value is normalized between $0$ and $1$. The $1/e$ law for the classical framework has a regret of $1 - 1/e \approx 0.632$; by contrast, we show that the class of ``pricing curves'' algorithms can guarantee a regret of at most $1/4 = 0.25$ (which is tight within the class), and the class of ``best-only pricing curves'' algorithms can guarantee a regret of at most $0.190$ (with a lower bound of $0.171$). In addition, we show that in general, no policy can give a regret guarantee better than $0.152$. Finally, we discuss other objectives in our regret-minimization framework, such as selecting the top-$k$ candidates for $k > 1$, or maximizing revenue during the selection process.
翻译:在秘书问题中,一组秘书候选人以均匀随机顺序到达,并逐一揭示其价值。一家公司只能雇用一名候选人,并希望最大化其雇用候选人的期望价值,因此需要在是否雇用当前候选人方面做出不可撤销的在线决策。评估策略的经典框架是计算其在最坏情况下相对于事后最优解的竞争比,其中最优策略——"1/e法则"——的竞争比为1/e。我们提出一种通过遗憾视角的替代评估框架——假设每个价值已归一化到0与1之间,遗憾定义为事后最优解与策略期望性能之间的最坏情况加性差异。经典框架下的1/e法则遗憾为1 - 1/e ≈ 0.632;相比之下,我们证明"定价曲线"算法类可保证至多1/4 = 0.25的遗憾(在该算法类内该界是紧的),而"仅最优定价曲线"算法类可保证至多0.190的遗憾(下界为0.171)。此外,我们证明在一般情况下,任何策略都无法给出优于0.152的遗憾保证。最后,我们讨论了遗憾最小化框架下的其他目标,例如为k > 1选择前k名候选人,或在选择过程中最大化收益。