The Gromov-Wasserstein (GW) distance, rooted in optimal transport (OT) theory, quantifies dissimilarity between metric measure spaces and provides a framework for aligning heterogeneous datasets. While computational aspects of the GW problem have been widely studied, a duality theory and fundamental statistical questions concerning empirical convergence rates remained obscure. This work closes these gaps for the quadratic GW distance over Euclidean spaces of different dimensions $d_x$ and $d_y$. We treat both the standard and the entropically regularized GW distance, and derive dual forms that represent them in terms of the well-understood OT and entropic OT (EOT) problems, respectively. This enables employing proof techniques from statistical OT based on regularity analysis of dual potentials and empirical process theory, using which we establish the first GW empirical convergence rates. The derived two-sample rates are $n^{-2/\max\{\min\{d_x,d_y\},4\}}$ (up to a log factor when $\min\{d_x,d_y\}=4$) for standard GW and $n^{-1/2}$ for EGW, which matches the corresponding rates for standard and entropic OT. The parametric rate for EGW is evidently optimal, while for standard GW we provide matching lower bounds, which establish sharpness of the derived rates. We also study stability of EGW in the entropic regularization parameter and prove approximation and continuity results for the cost and optimal couplings. Lastly, the duality is leveraged to shed new light on the open problem of the one-dimensional GW distance between uniform distributions on $n$ points, illuminating why the identity and anti-identity permutations may not be optimal. Our results serve as a first step towards a comprehensive statistical theory as well as computational advancements for GW distances, based on the discovered dual formulations.
翻译:Gromov-Wasserstein(GW)距离根植于最优输运(OT)理论,可量化度量度量空间之间的差异性,并为对齐异质数据集提供了框架。尽管GW问题的计算方面已被广泛研究,但其对偶理论以及关于经验收敛速率的基本统计问题仍不清晰。本文针对不同维度$d_x$和$d_y$的欧几里得空间上的二次GW距离填补了这些空白。我们处理了标准GW距离和熵正则化GW距离,并分别推导出其对偶形式,这些形式以易于理解的OT和熵OT(EOT)问题表示。这使得能够采用基于对偶势能正则性分析和经验过程理论的统计OT证明技术,据此我们建立了首个GW经验收敛速率。推导出的双样本速率对于标准GW为$n^{-2/\max\{\min\{d_x,d_y\},4\}}$(当$\min\{d_x,d_y\}=4$时含对数因子),对于EGW为$n^{-1/2}$,这与标准OT和熵OT的相应速率匹配。EGW的参数速率显然是最优的,而对于标准GW,我们提供了匹配的下界,证明了所推导速率的尖锐性。我们还研究了EGW在熵正则化参数下的稳定性,并证明了代价和最优耦合的逼近与连续性结果。最后,利用对偶性为均匀分布在$n$个点上的单变量GW距离这一未解决问题提供了新见解,阐明了为何恒等排列和反恒等排列可能不是最优解。我们的结果基于所发现的对偶形式,为GW距离的全面统计理论和计算进展迈出了第一步。