The Gromov-Wasserstein (GW) distance quantifies dissimilarity between metric measure spaces and provides a meaningful figure of merit for applications involving heterogeneous data. While computational aspects of the GW distance have been widely studied, a strong duality theory and fundamental statistical questions concerning empirical convergence rates remained obscure. This work closes these gaps for the $(2,2)$-GW distance (namely, with quadratic cost) over Euclidean spaces of different dimensions $d_x$ and $d_y$. We consider both the standard GW and the entropic GW (EGW) distances, derive their dual forms, and use them to analyze expected empirical convergence rates. The resulting rates are $n^{-2/\max\{d_x,d_y,4\}}$ (up to a log factor when $\max\{d_x,d_y\}=4$) and $n^{-1/2}$ for the two-sample GW and EGW problems, respectively, which matches the corresponding rates for standard and entropic optimal transport distances. We also study stability of EGW in the entropic regularization parameter and establish approximation and continuity results for the cost and optimal couplings. Lastly, the duality is leveraged to shed new light on the open problem of the one-dimensional GW distance between uniform distributions on $n$ points, illuminating why the identity and anti-identity permutations may not be optimal. Our results serve as a first step towards a comprehensive statistical theory as well as computational advancements for GW distances, based on the discovered dual formulation.
翻译:Gromov-Wasserstein(GW)距离度量度量空间之间的不相似性,并为涉及异构数据的应用提供了有意义的性能指标。尽管GW距离的计算方面已被广泛研究,但其强对偶理论以及关于经验收敛速率的基本统计问题仍不清晰。本文针对不同维度$d_x$和$d_y$的欧氏空间中的$(2,2)$-GW距离(即二次成本函数)填补了这些空白。我们分别考虑了标准GW距离和熵GW(EGW)距离,推导了它们的对偶形式,并利用这些形式分析了期望的经验收敛速率。得到的速率分别为:双样本GW问题的$n^{-2/\max\{d_x,d_y,4\}}$(当$\max\{d_x,d_y\}=4$时存在对数因子)和EGW问题的$n^{-1/2}$,这与标准最优传输距离和熵最优传输距离对应的速率一致。我们还研究了EGW在熵正则化参数下的稳定性,并建立了成本函数和最优耦合的逼近与连续性结果。最后,利用对偶性为一维GW距离中$n$个点上的均匀分布之间的开放问题提供了新见解,阐明了为什么恒等排列和反恒等排列可能不是最优的。基于发现的对偶形式,我们的结果为GW距离的全面统计理论以及计算进展迈出了第一步。