The beta distribution is the default choice of likelihood in many regression problems with a $[0,1]$-bounded support response despite its sensitivity to outliers, inability to accommodate exact zero observations, and a lack of closed-form conjugate priors. We address these shortcomings by introducing the triply-randomized negative binomial beta distribution, denoted $\mathrm{TNBbeta}(p,\,q,\,\varepsilon)$, parameterized by a median $p$, concentration parameter $q$, and boundary parameter $\varepsilon$ which permits positive density at $0$ and $1$. The TNBbeta arises by randomizing the parameters of a standard beta distribution with three dependent negative binomial random variables, each of whose complete conditional distribution we show is itself negative binomial. Moreover, connecting $p$ and $q$ to Gaussian latent variables with logit link functions yields closed-form updates via Pólya-gamma augmentation. Together, these properties yield simple auxiliary-variable Gibbs samplers for regression models of bounded-support data, which often outperform standard beta regression approaches in terms of effective sample size per second and held-out prediction, especially in the presence of outliers. In a case study of forest canopy cover, we demonstrate that this framework can easily incorporate spatial structure and exact zero observations. Overall, this work substantially expands the class of Bayesian models for $[0,1]$-bounded support data that can be fit efficiently.
翻译:贝塔分布是许多响应变量为$[0,1]$有界支持的回归问题中的默认似然函数选择,但它对异常值敏感、无法处理精确零观测值,且缺乏闭式共轭先验。我们通过引入三重随机化负二项贝塔分布(记为$\mathrm{TNBbeta}(p,\,q,\,\varepsilon)$)来解决这些缺陷,该分布以中位数$p$、浓度参数$q$和边界参数$\varepsilon$进行参数化,其中$\varepsilon$允许在0和1处具有正密度。TNBbeta分布通过对标准贝塔分布的参数进行三重依赖负二项随机变量随机化而生成,我们证明了每个变量的完全条件分布本身即为负二项分布。此外,通过将$p$和$q$与具有logit链接函数的高斯隐变量关联,借助Pólya-gamma增广可获得闭式更新公式。这些性质共同为有界支持数据的回归模型提供了简单的辅助变量吉布斯采样器,在每秒有效样本量和留出预测方面通常优于标准贝塔回归方法,尤其是在存在异常值的情况下。在森林冠层覆盖度的案例研究中,我们证明了该框架可以轻松融入空间结构和精确零观测值。总体而言,这项工作极大地扩展了可高效拟合的$[0,1]$有界支持数据贝叶斯模型类别。