Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \|x-y\|^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.
翻译:最大平均差异(MMD)流在大规模计算中面临高昂的计算成本。本文证明,采用Riesz核 $K(x,y) = - \|x-y\|^r$($r \in (0,2)$)的MMD流具有特殊性质,可实现高效计算。我们证明了Riesz核的MMD(亦称能量距离)与其切片版本的MMD等价。因此,MMD梯度的计算可在单维空间中进行。当 $r=1$ 时,可应用简单排序算法,将具有 $M$ 和 $N$ 个支撑点的两个测度间的复杂度从 $O(MN+N^2)$ 降至 $O((M+N)\log(M+N))$。另一个重要的推论是:紧支撑测度的MMD可通过Wasserstein-1距离进行上下界估计。在实际实现中,我们仅使用有限数量的 $P$ 个切片来近似切片MMD的梯度。结果表明,该近似的误差复杂度为 $O(\sqrt{d/P})$,其中 $d$ 为数据维度。这些成果使得我们能够通过神经网络逼近MMD梯度流来训练生成模型,即使对于图像应用也适用。通过在MNIST、FashionMNIST和CIFAR10数据集上的图像生成实验,我们验证了该模型的有效性。