The Sliced Wasserstein (SW) distance has become a popular alternative to the Wasserstein distance for comparing probability measures. Widespread applications include image processing, domain adaptation and generative modelling, where it is common to optimise some parameters in order to minimise SW, which serves as a loss function between discrete probability measures (since measures admitting densities are numerically unattainable). All these optimisation problems bear the same sub-problem, which is minimising the Sliced Wasserstein energy. In this paper we study the properties of $\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$, i.e. the SW distance between two uniform discrete measures with the same amount of points as a function of the support $Y \in \mathbb{R}^{n \times d}$ of one of the measures. We investigate the regularity and optimisation properties of this energy, as well as its Monte-Carlo approximation $\mathcal{E}_p$ (estimating the expectation in SW using only $p$ samples) and show convergence results on the critical points of $\mathcal{E}_p$ to those of $\mathcal{E}$, as well as an almost-sure uniform convergence and a uniform Central Limit result on the process $\mathcal{E}_p(Y)$. Finally, we show that in a certain sense, Stochastic Gradient Descent methods minimising $\mathcal{E}$ and $\mathcal{E}_p$ converge towards (Clarke) critical points of these energies.
翻译:切片Wasserstein(SW)距离已成为比较概率测度的Wasserstein距离的一种流行替代方法。其广泛应用包括图像处理、领域自适应和生成建模,在这些应用中通常通过优化某些参数来最小化SW距离,该距离作为离散概率测度之间的损失函数(因为具有密度的测度在数值上无法实现)。所有这些优化问题都包含相同的子问题,即最小化切片Wasserstein能量。本文研究能量函数$\mathcal{E}: Y \longmapsto \mathrm{SW}_2^2(\gamma_Y, \gamma_Z)$的性质,即两个具有相同点数的均匀离散测度之间的SW距离,该距离作为其中一个测度支撑集$Y \in \mathbb{R}^{n \times d}$的函数。我们探究该能量的正则性与优化特性,以及其蒙特卡洛近似$\mathcal{E}_p$(仅使用$p$个样本估计SW中的期望),并证明$\mathcal{E}_p$临界点向$\mathcal{E}$临界点的收敛结果,同时给出过程$\mathcal{E}_p(Y)$的几乎必然一致收敛与一致中心极限定理结果。最后,我们证明在特定意义下,最小化$\mathcal{E}$和$\mathcal{E}_p$的随机梯度下降方法会收敛至这些能量的(Clarke)临界点。