Transform-scaled process priors for trait allocations in Bayesian nonparametrics

Completely random measures (CRMs) provide a broad class of priors, arguably, the most popular, for Bayesian nonparametric (BNP) analysis of trait allocations. As a peculiar property, CRM priors lead to predictive distributions that share the following common structure: for fixed prior's parameters, a new data point exhibits a Poisson (random) number of ``new'' traits, i.e., not appearing in the sample, which depends on the sampling information only through the sample size. While the Poisson posterior distribution is appealing for analytical tractability and ease of interpretation, its independence from the sampling information is a critical drawback, as it makes the posterior distribution of ``new'' traits completely determined by the estimation of the unknown prior's parameters. In this paper, we introduce the class of transform-scaled process (T-SP) priors as a tool to enrich the posterior distribution of ``new'' traits arising from CRM priors, while maintaining the same analytical tractability and ease of interpretation. In particular, we present a framework for posterior analysis of trait allocations under T-SP priors, showing that Stable T-SP priors, i.e., T-SP priors built from Stable CRMs, lead to predictive distributions such that, for fixed prior's parameters, a new data point displays a negative-Binomial (random) number of ``new'' traits, which depends on the sampling information through the number of distinct traits and the sample size. Then, by relying on a hierarchical version of T-SP priors, we extend our analysis to the more general setting of trait allocations with multiple groups of data or subpopulations. The empirical effectiveness of our methods is demonstrated through numerical experiments and applications to real data.

翻译：完全随机测度（CRMs）为贝叶斯非参数（BNP）特质分配分析提供了广泛且最常使用的先验类。作为其独特性质，CRM先验导致预测分布具有如下共同结构：在固定先验参数下，新数据点呈现泊松（随机）数量的“新”特质（即未在样本中出现），该数量仅通过样本大小依赖于抽样信息。尽管泊松后验分布在分析可处理性和解释简便性上具有吸引力，但其与抽样信息的独立性是关键的缺陷——这使得“新”特质的后验分布完全由未知先验参数的估计决定。本文引入变换缩放过程（T-SP）先验类，作为丰富CRM先验中“新”特质后验分布的工具，同时保持相同的分析可处理性和解释简便性。具体而言，我们提出了T-SP先验下特质分配的后验分析框架，表明稳定T-SP先验（即基于稳定CRM构建的T-SP先验）会导致如下预测分布：在固定先验参数下，新数据点呈现负二项（随机）数量的“新”特质，该数量通过独特特质数量和样本大小依赖于抽样信息。随后，通过利用T-SP先验的层次化版本，我们将分析扩展到包含多组数据或子群体的更一般的特质分配场景。通过数值实验和真实数据应用验证了我们方法的实证有效性。