Collections of probability distributions arise in a variety of applications ranging from user activity pattern analysis to brain connectomics. In practice these distributions can be defined over diverse domain types including finite intervals, circles, cylinders, spheres, other manifolds, and graphs. This paper introduces an approach for detecting differences between two collections of distributions over such general domains. To this end, we propose the intrinsic slicing construction that yields a novel class of Wasserstein distances on manifolds and graphs. These distances are Hilbert embeddable, allowing us to reduce the distribution collection comparison problem to a more familiar mean testing problem in a Hilbert space. We provide two testing procedures one based on resampling and another on combining p-values from coordinate-wise tests. Our experiments in various synthetic and real data settings show that the resulting tests are powerful and the p-values are well-calibrated.
翻译:概率分布集合出现在多种应用中,从用户活动模式分析到脑连接组学。实践中,这些分布可定义于多种域类型,包括有限区间、圆环、圆柱、球面、其他流形以及图。本文提出一种方法,用于检测此类一般域上两个分布集合之间的差异。为此,我们引入了内在切片构造方法,该方法在流形和图上产生了一类新颖的Wasserstein距离。这些距离是可希尔伯特嵌入的,从而将分布集合比较问题简化为希尔伯特空间中更熟悉的均值检验问题。我们提供了两种检验程序:一种基于重抽样,另一种基于组合坐标方向检验的p值。在多种合成和真实数据场景下的实验表明,所得检验具有高检验效能,且p值校准良好。