We study the repeated optimal stopping problem, in which the same optimal stopping instance with an unknown distribution is solved repeatedly over $T$ rounds. We aim to simultaneously achieve strong per-round performance guarantees relative to a given baseline and sublinear regret across all rounds. Our primary contribution is a comprehensive theoretical characterization of whether and when these two objectives are compatible. First, under standard semi-bandit feedback, we prove that maintaining the per-round guarantee forces regret of $Ω(T / \log T)$. Second, even under full feedback, we show that requiring almost-sure satisfaction of the per-round guarantee in every round is incompatible with sublinear regret. Third, under full feedback, we propose a general algorithmic framework that achieves both sublinear regret and the per-round guarantee with high probability. Our framework applies to canonical problems, including the prophet inequality, the secretary problem, and their variants under adversarial, random, and i.i.d. input models. For example, in the repeated prophet inequality problem, our method guarantees that, with high probability in each round, its expected reward is at least that of the classical single-sample algorithm, which achieves a $1/2$ competitive ratio, while simultaneously ensuring $\tilde{O}(\sqrt{T})$ regret. Furthermore, we establish a regret lower bound of $Ω(\sqrt{T})$ even in the i.i.d. model, which is nearly tight with respect to the number of rounds.
翻译:我们研究了重复最优停时问题,其中具有未知分布的同一最优停时实例在$T$轮中重复求解。我们的目标是在相对于给定基线实现强每轮表现保证的同时,在整个回合中实现次线性遗憾。我们的主要贡献是对这两个目标是否以及何时兼容进行了全面的理论刻画。首先,在标准半臂反馈下,我们证明维持每轮保证迫使遗憾为$Ω(T / \log T)$。其次,即使在完全反馈下,我们证明要求每轮几乎必然满足每轮保证与次线性遗憾不相容。第三,在完全反馈下,我们提出一个通用算法框架,该框架能以高概率同时实现次线性遗憾和每轮保证。我们的框架适用于经典问题,包括先知不等式、秘书问题及其在对抗、随机和独立同分布输入模型下的变体。例如,在重复先知不等式问题中,我们的方法保证在每轮中,其期望收益至少达到经典单样本算法(实现$1/2$竞争比)的水平,同时确保$\tilde{O}(\sqrt{T})$的遗憾。此外,我们建立了即使在独立同分布模型下遗憾为$Ω(\sqrt{T})$的下界,该下界在轮数方面几乎是紧的。