The Dirichlet process has been pivotal to the development of Bayesian nonparametrics, allowing one to learn the law of the observations through closed-form expressions. Still, its learning mechanism is often too simplistic and many generalizations have been proposed to increase its flexibility, a popular one being the class of normalized completely random measures. Here we investigate a simple yet fundamental matter: will a different prior actually guarantee a different learning outcome? To this end, we develop a new framework for assessing the merging rate of opinions based on three leading pillars: i) the investigation of identifiability of completely random measures; ii) the measurement of their discrepancy through a novel optimal transport distance; iii) the establishment of general techniques to conduct posterior analyses, unravelling both finite-sample and asymptotic behaviour of the distance as the number of observations grows. Our findings provide neat and interpretable insights on the impact of popular Bayesian nonparametric priors, avoiding the usual restrictive assumptions on the data-generating process.
翻译:狄利克雷过程是贝叶斯非参数统计发展的关键,它通过闭合形式表达式允许学习观测数据的分布规律。然而,其学习机制往往过于简化,为此诸多推广形式被提出以增加其灵活性,其中归一化完全随机测度类便是常见选择。本文探讨一个简单却根本性的问题:不同的先验是否必然保证不同的学习结果?为此,我们开发了一个基于三大支柱的新框架来评估观点融合速率:i) 研究完全随机测度的可识别性;ii) 通过新颖的最优传输距离度量其差异;iii) 建立通用的后验分析技术,揭示该距离随观测数量增长的有限样本与渐近行为。研究结果提供了关于常用贝叶斯非参数先验影响的清晰解释性洞见,且避免了通常对数据生成过程的限制性假设。