Although exchangeable processes from Bayesian nonparametrics have been used as a generating mechanism for random partition models, we deviate from this paradigm to explicitly incorporate clustering information in the formulation our random partition model. Our shrinkage partition distribution takes any partition distribution and shrinks its probability mass toward an anchor partition. We show how this provides a framework to model hierarchically-dependent and temporally-dependent random partitions. The shrinkage parameters control the degree of dependence, accommodating at its extremes both independence and complete equality. Since a priori knowledge of items may vary, our formulation allows the degree of shrinkage toward the anchor to be item-specific. Our random partition model has a tractable normalizing constant which allows for standard Markov chain Monte Carlo algorithms for posterior sampling. We prove intuitive theoretical properties for our distribution and compare it to related partition distributions. We show that our model provides better out-of-sample fit in a real data application.
翻译:尽管贝叶斯非参数方法中的可交换过程已被用作随机划分模型的生成机制,但我们偏离这一范式,在随机划分模型的构建中显式纳入聚类信息。提出的收缩划分分布基于任意划分分布,通过将其概率质量向锚定划分收缩。我们展示如何基于该框架构建层次相依及时序相依的随机划分模型。收缩参数控制依赖程度,在极端情况下既能实现完全独立也能达到完全一致。由于不同项目的先验知识可能不同,我们的模型允许各项目向锚定划分的收缩程度具有项目特异性。该随机划分模型的归一化常数具有可处理性,支持通过标准马尔可夫链蒙特卡洛算法进行后验采样。我们证明了所提分布的理论特性,并与相关划分分布进行对比。实际数据应用表明,该模型具有更优的样本外拟合性能。