Dynamic resource allocation problems are ubiquitous, arising in inventory management, order fulfillment, online advertising, and other applications. We initially focus on one of the simplest models of online resource allocation: the multisecretary problem. In the multisecretary problem, a decision maker sequentially hires up to $B$ out of $T$ candidates, and candidate ability values are drawn i.i.d. from a distribution $F$ on $[0,1]$. First, we investigate fundamental limits on performance as a function of the value distribution under consideration. We quantify performance in terms of regret, defined as the additive loss relative to the best performance achievable in hindsight. We present a novel fundamental regret lower bound scaling of $\Omega(T^{1/2 - 1/2(1 + \beta)})$ for distributions with gaps in their support, with $\beta$ quantifying the mass accumulation of types (values) around these gaps. This lower bound contrasts with the constant and logarithmic regret guarantees shown to be achievable in prior work, under specific assumptions on the value distribution. Second, we introduce a novel algorithmic principle, Conservativeness with respect to Gaps (CwG), which yields near-optimal performance with regret scaling of $\tilde{O}(T^{1/2 - 1/2(1 + \beta)})$ for any distribution in a class parameterized by the mass accumulation parameter $\beta$. We then turn to operationalizing the CwG principle across dynamic resource allocation problems. We study a general and practical algorithm, Repeatedly Act using Multiple Simulations (RAMS), which simulates possible futures to estimate a hindsight-based approximation of the value-to-go function. We establish that this algorithm inherits theoretical performance guarantees of algorithms tailored to the distribution of resource requests, including our CwG-based algorithm, and find that it outperforms them in numerical experiments.
翻译:动态资源分配问题普遍存在,涉及库存管理、订单履行、在线广告及其他应用领域。我们首先聚焦于在线资源分配的最简模型之一:多秘书问题。在该问题中,决策者需从 $T$ 名候选者中依次雇用至多 $B$ 人,候选者的能力值独立同分布于 $[0,1]$ 上的分布 $F$。首先,我们研究不同价值分布下性能的基本界限,以遗憾值(定义为相对于事后最优性能的加性损失)量化性能。对于支撑集存在间隙的分布,我们提出了一种新的基本遗憾下界,其标度为 $\Omega(T^{1/2 - 1/2(1 + \beta)})$,其中 $\beta$ 量化了类型(价值)在这些间隙周围的累积质量。该下界与先前工作基于特定价值分布假设所证明的可实现对数遗憾或常数遗憾保证形成对比。其次,我们引入新算法原则——间隙保守性(CwG),该原则对由质量累积参数 $\beta$ 参数化的分布类中的任意分布,均可实现近最优性能,遗憾标度为 $\tilde{O}(T^{1/2 - 1/2(1 + \beta)})$。进而,我们将CwG原则推广至动态资源分配问题,研究了一种通用实用算法——重复执行多重模拟(RAMS),该算法通过模拟可能未来场景来估计基于事后视角的价值函数近似。我们证明,该算法继承了针对资源请求分布定制的算法(包括基于CwG的算法)的理论性能保证,并在数值实验中表现更优。