Max sliced Wasserstein (Max-SW) distance has been widely known as a solution for less discriminative projections of sliced Wasserstein (SW) distance. In applications that have various independent pairs of probability measures, amortized projection optimization is utilized to predict the ``max" projecting directions given two input measures instead of using projected gradient ascent multiple times. Despite being efficient, Max-SW and its amortized version cannot guarantee metricity property due to the sub-optimality of the projected gradient ascent and the amortization gap. Therefore, we propose to replace Max-SW with distributional sliced Wasserstein distance with von Mises-Fisher (vMF) projecting distribution (v-DSW). Since v-DSW is a metric with any non-degenerate vMF distribution, its amortized version can guarantee the metricity when performing amortization. Furthermore, current amortized models are not permutation invariant and symmetric. To address the issue, we design amortized models based on self-attention architecture. In particular, we adopt efficient self-attention architectures to make the computation linear in the number of supports. With the two improvements, we derive self-attention amortized distributional projection optimization and show its appealing performance in point-cloud reconstruction and its downstream applications.
翻译:最大切片Wasserstein(Max-SW)距离被广泛认为是解决切片Wasserstein(SW)距离投影区分性不足的有效方案。在处理多个独立概率测度对的应用中,采用摊销投影优化来预测给定两个输入测度的"最大"投影方向,而非多次使用投影梯度上升法。尽管效率较高,但Max-SW及其摊销版本因投影梯度上升的子最优性和摊销间隙而无法保证度量性质。为此,我们提出用基于von Mises-Fisher(vMF)投影分布的分布切片Wasserstein距离(v-DSW)替代Max-SW。由于v-DSW在任意非退化vMF分布下均为度量,其摊销版本在实施摊销时能保证度量性质。此外,现有摊销模型不具备置换不变性和对称性。为解决该问题,我们基于自注意力架构设计摊销模型,特别地采用高效自注意力架构使计算复杂度与支撑集数量呈线性关系。通过这两项改进,我们推导出自注意力摊销分布投影优化方法,并在点云重建及其下游应用中展示其优异性能。