We study two-stage bipartite matching, in which the edges of a bipartite graph on vertices $(B_1 \cup B_2, I)$ are revealed in two batches. In stage one, a matching must be selected from among revealed edges $E \subseteq B_1 \times I$. In stage two, edges $E^\theta \subseteq B_2 \times I$ are sampled from a known distribution, and a second matching must be selected between $B_2$ and unmatched vertices in $I$. The objective is to maximize the total weight of the combined matching. We design polynomial-time approximations to the optimum online algorithm, achieving guarantees of $7/8$ for vertex-weighted graphs and $2\sqrt{2}-2 \approx 0.828$ for edge-weighted graphs under arbitrary distributions. Both approximation ratios match known upper bounds on the integrality gap of the natural fractional relaxation, improving upon the best-known approximation of 0.767 by Feng, Niazadeh, and Saberi for unweighted graphs whose second batch consists of independently arriving nodes. Our results are obtained via an algorithm that rounds a fractional matching revealed in two stages, aiming to match offline nodes (respectively, edges) with probability proportional to their fractional weights, up to a constant-factor loss. We leverage negative association (NA) among offline node availabilities -- a property induced by dependent rounding -- to derive new lower bounds on the expected size of the maximum weight matching in random graphs where one side is realized via NA binary random variables. Moreover, we extend these results to settings where we have only sample access to the distribution. In particular, $\text{poly}(n,\epsilon^{-1})$ samples suffice to obtain an additive loss of $\epsilon$ in the approximation ratio for the vertex-weighted problem; a similar bound holds for the edge-weighted problem with an additional (unavoidable) dependence on the scale of edge weights.
翻译:我们研究两阶段二分图匹配问题,其中定义在顶点集 $(B_1 \cup B_2, I)$ 上的二分图的边分两批揭示。在第一阶段,必须从已揭示的边集 $E \subseteq B_1 \times I$ 中选择一个匹配。在第二阶段,边集 $E^\theta \subseteq B_2 \times I$ 从一个已知分布中采样得到,并且必须在 $B_2$ 与 $I$ 中未匹配的顶点之间选择第二个匹配。目标在于最大化组合匹配的总权重。我们设计了多项式时间算法来逼近最优在线算法,对于任意分布下的图,在顶点加权情况下达到 $7/8$ 的近似保证,在边加权情况下达到 $2\sqrt{2}-2 \approx 0.828$ 的近似保证。这两个近似比均匹配了自然分数松弛的整数间隙的已知上界,改进了 Feng、Niazadeh 和 Saberi 针对第二批次由独立到达节点构成的无权图所提出的最佳已知近似比 0.767。我们的结果通过一种算法获得,该算法对分两阶段揭示的分数匹配进行舍入,旨在以与其分数权重成比例的概率(最多损失一个常数因子)来匹配离线节点(或边)。我们利用离线节点可用性之间的负关联性(NA)——这是由依赖舍入诱导的性质——来推导随机图中最大权重匹配期望大小的新下界,其中一侧的节点是通过 NA 二元随机变量实现的。此外,我们将这些结果扩展到仅能通过采样访问分布的场景。具体而言,对于顶点加权问题,$\text{poly}(n,\epsilon^{-1})$ 个样本足以在近似比上获得 $\epsilon$ 的加性损失;对于边加权问题,在额外(不可避免的)依赖于边权重尺度的情况下,类似的界也成立。