Optimal transport and Wasserstein distances are flourishing in many scientific fields as a means for comparing and connecting random structures. Here we pioneer the use of an optimal transport distance between L\'{e}vy measures to solve a statistical problem. Dependent Bayesian nonparametric models provide flexible inference on distinct, yet related, groups of observations. Each component of a vector of random measures models a group of exchangeable observations, while their dependence regulates the borrowing of information across groups. We derive the first statistical index of dependence in $[0,1]$ for (completely) random measures that accounts for their whole infinite-dimensional distribution, which is assumed to be equal across different groups. This is accomplished by using the geometric properties of the Wasserstein distance to solve a max-min problem at the level of the underlying L\'{e}vy measures. The Wasserstein index of dependence sheds light on the models' deep structure and has desirable properties: (i) it is $0$ if and only if the random measures are independent; (ii) it is $1$ if and only if the random measures are completely dependent; (iii) it simultaneously quantifies the dependence of $d \ge 2$ random measures, avoiding the need for pairwise comparisons; (iv) it can be evaluated numerically. Moreover, the index allows for informed prior specifications and fair model comparisons for Bayesian nonparametric models.
翻译:最优传输与Wasserstein距离作为比较与连接随机结构的工具,在众多科学领域蓬勃发展。本文开创性地利用Lévy测度之间的最优传输距离来解决统计问题。相依贝叶斯非参数模型为不同但相关的观测组提供了灵活的推断。随机测度向量中的每个分量对一组可交换观测进行建模,而它们之间的相依关系调控着组间信息共享。我们针对(完全)随机测度推导出首个取值于[0,1]的统计相依指数,该指数能够刻画其整个无穷维分布(假定不同组间分布相同)。这一成果通过利用Wasserstein距离的几何性质,在底层Lévy测度层面求解极大极小问题而实现。该Wasserstein相依指数揭示了模型的深层结构,并具有理想性质:(i)当且仅当随机测度相互独立时取值为0;(ii)当且仅当随机测度完全相依时取值为1;(iii)可同时量化d≥2个随机测度的相依性,无需进行两两比较;(iv)可进行数值评估。此外,该指数支持贝叶斯非参数模型的信息化先验设定与公平模型比较。