We introduce weak barycenters of a family of probability distributions, based on the recently developed notion of optimal weak transport of mass by Gozlanet al. (2017) and Backhoff-Veraguas et al. (2020). We provide a theoretical analysis of this object and discuss its interpretation in the light of convex ordering between probability measures. In particular, we show that, rather than averaging the input distributions in a geometric way (as the Wasserstein barycenter based on classic optimal transport does) weak barycenters extract common geometric information shared by all the input distributions, encoded as a latent random variable that underlies all of them. We also provide an iterative algorithm to compute a weak barycenter for a finite family of input distributions, and a stochastic algorithm that computes them for arbitrary populations of laws. The latter approach is particularly well suited for the streaming setting, i.e., when distributions are observed sequentially. The notion of weak barycenter and our approaches to compute it are illustrated on synthetic examples, validated on 2D real-world data and compared to standard Wasserstein barycenters.
翻译:我们引入基于近期发展的最优弱质量传输理论(Gozlan等人,2017;Backhoff-Veraguas等人,2020)的族概率分布的弱重心概念。本文提供该对象的理论分析,并探讨其在概率测度凸排序视角下的解释。特别地,我们证明:与基于经典最优传输的Wasserstein重心不同,弱重心并非以几何方式对输入分布进行平均,而是提取所有输入分布共有的几何信息——这些信息被编码为潜在于所有输入分布背后的潜随机变量。我们还提出两类算法:计算有限输入分布族的弱重心的迭代算法,以及适用于任意分布群体的随机算法。后者尤其适合流式场景(即分布被顺序观测的情形)。通过合成算例验证弱重心概念及其计算方法的有效性,并基于2D真实数据与标准Wasserstein重心进行对比。