We present the Fourier Sliced-Wasserstein (FSW) embedding - a novel method to embed multisets and measures over $\mathbb{R}^d$ into Euclidean space. Our proposed embedding approximately preserves the sliced Wasserstein distance on distributions, thereby yielding geometrically meaningful representations that better capture the structure of the input. Moreover, it is injective on measures and bi-Lipschitz on multisets - a significant advantage over prevalent methods based on sum- or max-pooling, which are provably not bi-Lipschitz, and, in many cases, not even injective. The required output dimension for these guarantees is near-optimal: roughly $2 N d$, where $N$ is the maximal input multiset size. Furthermore, we prove that it is impossible to embed distributions over $\mathbb{R}^d$ into Euclidean space in a bi-Lipschitz manner. Thus, the metric properties of our embedding are, in a sense, the best possible. Through numerical experiments, we demonstrate that our method yields superior multiset representations that improve performance in practical learning tasks. Specifically, we show that (a) a simple combination of the FSW embedding with an MLP achieves state-of-the-art performance in learning the (non-sliced) Wasserstein distance; and (b) replacing max-pooling with the FSW embedding makes PointNet significantly more robust to parameter reduction, with only minor performance degradation even after a 40-fold reduction.
翻译:本文提出傅里叶切片-瓦瑟斯坦嵌入——一种将 $\mathbb{R}^d$ 上的多重集与测度嵌入欧几里得空间的新方法。该嵌入近似保持分布间的切片瓦瑟斯坦距离,从而产生能更好捕捉输入结构、具有几何意义的表示。此外,该嵌入对测度具有单射性,对多重集具有双利普希茨连续性——相较于普遍采用的基于求和池化或最大池化的方法具有显著优势,后者被证明不具备双利普希茨性,且在多数情况下甚至不满足单射性。实现这些理论保证所需的输出维度近乎最优:约为 $2 N d$,其中 $N$ 表示最大输入多重集规模。进一步地,我们证明将 $\mathbb{R}^d$ 上的分布以双利普希茨方式嵌入欧几里得空间是不可能的。因此,本文所提嵌入的度量性质在某种意义上已达到最优。通过数值实验,我们验证了该方法能产生更优的多重集表示,在实际学习任务中提升性能。具体而言,我们证明:(a)将FSW嵌入与多层感知机简单结合,即可在学习(非切片)瓦瑟斯坦距离的任务中达到最先进性能;(b)用FSW嵌入替代最大池化操作,能使PointNet在参数大幅削减时保持显著更强的鲁棒性,即使参数减少40倍也仅出现轻微性能下降。