Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.
翻译:最大均值差异(MMD)流在大规模计算中面临高计算成本的问题。本文证明,具有Riesz核$K(x,y) = - \Vert x-y\Vert^r$($r \in (0,2)$)的MMD流具有独特性质,可显著提升计算效率。我们证明Riesz核(即能量距离)的MMD与其切片版本的MMD相等。据此,MMD梯度的计算可在单变量场景下完成。当$r=1$时,应用简单排序算法可将两个分别具有$M$和$N$支撑点的测度计算复杂度从$O(MN+N^2)$降低至$O((M+N)\log(M+N))$。另一个有趣的结果是,紧支撑测度的MMD可通过Wasserstein-1距离进行上下界估计。在实现中,我们仅用有限数量的$P$个切片近似切片MMD的梯度,并证明其误差复杂度为$O(\sqrt{d/P})$(其中$d$为数据维度)。这些成果使我们可以通过神经网络近似MMD梯度流来训练生成模型,甚至适用于图像应用。我们通过在MNIST、FashionMNIST和CIFAR10数据集上的图像生成实验验证了模型效率。