Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K(x,y) = - \Vert x-y\Vert^r$, $r \in (0,2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r=1$, a simple sorting algorithm can be applied to reduce the complexity from $O(MN+N^2)$ to $O((M+N)\log(M+N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by using only a finite number $P$ of slices. We show that the resulting error has complexity $O(\sqrt{d/P})$, where $d$ is the data dimension. These results enable us to train generative models by approximating MMD gradient flows by neural networks even for image applications. We demonstrate the efficiency of our model by image generation on MNIST, FashionMNIST and CIFAR10.
翻译:最大均值差异(MMD)流在大规模计算中面临高计算成本的挑战。本文证明,具有Riesz核$K(x,y) = - \Vert x-y\Vert^r$(其中$r \in (0,2)$)的MMD流具备显著特性,可实现高效计算。我们证明了Riesz核的MMD与其切片版本的MMD等价。由此,MMD梯度的计算可在一维场景下完成。特别地,当$r=1$时,可采用简单排序算法将两个具有$M$和$N$个支撑点的测度计算复杂度从$O(MN+N^2)$降至$O((M+N)\log(M+N))$。另一重要推论表明,紧支撑测度的MMD可由Wasserstein-1距离进行上下界估计。在实现过程中,我们通过仅使用有限数量$P$个切片来近似切片MMD的梯度。理论分析表明,该近似误差的复杂度为$O(\sqrt{d/P})$,其中$d$为数据维度。这些结果使我们能够通过神经网络近似MMD梯度流来训练生成模型,甚至适用于图像任务。我们通过在MNIST、FashionMNIST和CIFAR10数据集上的图像生成实验验证了模型的高效性。