Data represented by probability measures arise as empirical distributions, posterior distributions, and feature-based representations of complex objects. We study heterogeneity in a population of probability measures through the expected value of a chosen transform of the pairwise Wasserstein distance. The resulting estimator is unbiased and, under simple moment conditions on the population law, is strongly consistent, asymptotically normal, and equipped with a consistent standard error. This also yields a simple comparison of two populations and remains stable under plug-in approximation when the measures are estimated. The associated empirical eccentricities identify the observations that contribute most strongly to heterogeneity within a sample.
翻译:以概率测度表示的数据常作为经验分布、后验分布以及复杂对象的特征表示出现。本文通过选取成对Wasserstein距离变换的期望值来研究概率测度总体中的异质性。所得估计量具有无偏性,且在总体分布满足简单矩条件下具有强相合性、渐近正态性,并配备相合的标准误估计。该方法还可用于两个总体的简单比较,且在测度为估计值时通过插件近似保持稳定性。与之关联的经验离心率可识别样本中对异质性贡献最显著的观测值。