Approximating a probability distribution using a set of particles is a fundamental problem in machine learning and statistics, with applications including clustering and quantization. Formally, we seek a weighted mixture of Dirac measures that best approximates the target distribution. While much existing work relies on the Wasserstein distance to quantify approximation errors, maximum mean discrepancy (MMD) has received comparatively less attention, especially when allowing for variable particle weights. We argue that a Wasserstein-Fisher-Rao gradient flow is well-suited for designing quantizations optimal under MMD. We show that a system of interacting particles satisfying a set of ODEs discretizes this flow. We further derive a new fixed-point algorithm called mean shift interacting particles (MSIP). We show that MSIP extends the classical mean shift algorithm, widely used for identifying modes in kernel density estimators. Moreover, we show that MSIP can be interpreted as preconditioned gradient descent and that it acts as a relaxation of Lloyd's algorithm for clustering. Our unification of gradient flows, mean shift, and MMD-optimal quantization yields algorithms that are more robust than state-of-the-art methods, as demonstrated via high-dimensional and multi-modal numerical experiments.
翻译:在机器学习和统计学中,用粒子集合近似概率分布是一个基本问题,应用包括聚类和量化。形式上,我们寻求最优逼近目标分布的狄拉克测度加权混合。现有研究大多依赖沃瑟斯坦距离量化近似误差,而最大平均差异(MMD)受到的关注相对较少,尤其是在允许可变粒子权重的情况下。我们认为沃瑟斯坦-费舍尔-拉奥梯度流非常适合设计基于MMD的最优量化方案。我们证明,满足一组常微分方程的相互作用粒子系统离散了这一梯度流。进一步推导出一种新的不动点算法——均值漂移相互作用粒子(MSIP)。我们证明MSIP推广了经典的均值漂移算法(该算法广泛用于核密度估计的模态识别)。此外,我们表明MSIP可解释为预处理梯度下降,并且作为劳埃德聚类算法的松弛形式。通过高维和多模态数值实验,我们展示了这种梯度流、均值漂移与MMD最优量化的统一方法比现有最先进算法具有更强的鲁棒性。