Species sampling processes have long served as the framework for studying random discrete distributions. However, their statistical applicability is limited when partial exchangeability is assumed as probabilistic invariance for the observables. Despite numerous discrete models for partially exchangeable observations, a unifying framework is currently missing, leaving many questions about the induced learning mechanisms unanswered in this setting. To fill this gap, we consider the natural extension of species sampling models to a multivariate framework, obtaining a general class of models characterized by their partially exchangeable partition probability function. A notable subclass, named regular multivariate species sampling models, exists among these models. In the subclass, dependence across processes is accurately captured by the correlation among them: a correlation of one equals full exchangeability and a null correlation corresponds to independence. Regular multivariate species sampling models encompass discrete processes for partial exchangeable data used in Bayesian models, thereby highlighting their core distributional properties and providing a means for developing new models.
翻译:物种抽样过程长期以来作为研究随机离散分布的框架。然而,当观测数据被假设具有部分可交换性作为概率不变性时,其统计适用性受到限制。尽管存在众多针对部分可交换观测的离散模型,目前仍缺乏一个统一的框架,导致在此背景下许多关于诱导学习机制的问题尚未得到解答。为填补这一空白,我们考虑将物种抽样模型自然扩展至多元框架,获得一类以部分可交换划分概率函数为特征的通用模型。在这些模型中存在一个显著子类,称为正则多元物种抽样模型。在该子类中,过程间的依赖性通过其相关性被精确刻画:相关性为1对应完全可交换性,零相关性则对应独立性。正则多元物种抽样模型涵盖了贝叶斯模型中用于部分可交换数据的离散过程,从而揭示了其核心分布特性,并为开发新模型提供了途径。