Availability is all you need: achieving optimal regret with minimal information for dynamic matching

We study a centralized discrete-time dynamic two-way matching model with finitely many agent types. Agents arrive stochastically over time and join their type-dedicated queues waiting to be matched. We focus on availability-based policies that make matching decisions based solely on agent availability across types (i.e., whether queues are empty or not), rather than relying on complete queue-length information (e.g., the longest-queue policy). We aim to achieve constant regret at all times with optimal scaling in terms of the general position gap, $ε$, which measures the distance of the fluid relaxation from degeneracy. We classify availability-based policies into global and local policies based on the scope of information they utilize. First, for general networks (possibly cyclic), we propose a global availability-based policy, probabilistic matching, and prove that it achieves the optimal all-time regret scaling of $O(ε^{-1})$, matching the known lower bound established by [KAG24]. Second, for acyclic networks, we focus on the class of local availability-based policies, specifically static priority policies that prioritize matches based on a fixed order. Within this class, we derive the first explicit regret bound for the previously proposed tree priority policy, showing all-time regret scaling of $O(ε^{-(d+1)/2})$, where $d$ is the network depth. Next, we introduce a new truncated tree priority policy and prove that it is the first static priority policy to achieve the optimal all-time regret scaling of $O(ε^{-1})$. These policies are appealing for matching systems such as queueing and load balancing; they reduce operational costs by using minimal information while effectively balancing the trade-off between immediate and future rewards.

翻译：我们研究一个具有有限种代理类型的集中式离散时间动态双向匹配模型。代理随时间随机到达并加入其类型专属的队列等待匹配。我们聚焦于可用性策略，该策略仅基于跨类型代理的可用性（即队列是否为空）做出匹配决策，而非依赖完整的队列长度信息（例如最长队列策略）。我们的目标是在所有时间实现恒定遗憾，并在广义位置间隙 $ε$（用于衡量流体松弛与退化状态的距离）方面达到最优标度。根据所利用信息的范围，我们将可用性策略分为全局策略与局部策略。首先，针对一般网络（可能包含循环），我们提出一种全局可用性策略——概率匹配策略，并证明其实现了最优的全时遗憾标度 $O(ε^{-1})$，这与[KAG24]建立的已知下界相匹配。其次，针对无环网络，我们聚焦于局部可用性策略类别，特别是基于固定顺序优先匹配的静态优先级策略。在此类别中，我们首次推导出先前提出的树优先级策略的显式遗憾界，证明其全时遗憾标度为 $O(ε^{-(d+1)/2})$，其中 $d$ 为网络深度。接着，我们提出一种新的截断树优先级策略，并证明这是首个实现最优全时遗憾标度 $O(ε^{-1})$ 的静态优先级策略。这些策略对于排队和负载均衡等匹配系统具有吸引力：它们通过使用最小信息降低运营成本，同时有效平衡即时回报与未来回报之间的权衡。