Within Bayesian nonparametrics, dependent Dirichlet process mixture models provide a highly flexible approach for conducting inference about the conditional density function. However, several formulations of this class make either rather restrictive modelling assumptions or involve intricate algorithms for posterior inference, thus preventing their widespread use. In response to these challenges, we present a flexible, versatile, and computationally tractable model for density regression based on a single-weights dependent Dirichlet process mixture of normal distributions model for univariate continuous responses. We assume an additive structure for the mean of each mixture component and incorporate the effects of continuous covariates through smooth nonlinear functions. The key components of our modelling approach are penalised B-splines and their bivariate tensor product extension. Our proposed method also seamlessly accommodates parametric effects of categorical covariates, linear effects of continuous covariates, interactions between categorical and/or continuous covariates, varying coefficient terms, and random effects, which is why we refer our model as a Dirichlet process mixture of normal structured additive regression models. A noteworthy feature of our method is its efficiency in posterior simulation through Gibbs sampling, as closed-form full conditional distributions for all model parameters are available. Results from a simulation study demonstrate that our approach successfully recovers true conditional densities and other regression functionals in various challenging scenarios. Applications to a toxicology, disease diagnosis, and agricultural study are provided and further underpin the broad applicability of our modelling framework. An R package, DDPstar, implementing the proposed method is publicly available at https://bitbucket.org/mxrodriguez/ddpstar.
翻译:在贝叶斯非参数框架中,依赖型狄利克雷过程混合模型为条件密度函数的推断提供了高度灵活的方法。然而,该类模型的若干公式化方法要么施加了较为严格的建模假设,要么需要复杂的后验推断算法,从而阻碍了其广泛应用。针对这些挑战,我们提出了一种灵活、通用且计算可处理的密度回归模型,该模型基于单权重依赖型狄利克雷过程混合正态分布模型,适用于单变量连续响应变量。我们假设每个混合分量的均值具有可加结构,并通过光滑非线性函数纳入连续协变量的影响。建模方法的核心要素是惩罚B样条及其二元张量积扩展。我们提出的方法还能无缝容纳分类协变量的参数效应、连续协变量的线性效应、分类和/或连续协变量的交互作用、变系数项以及随机效应,因此我们将该模型称为狄利克雷过程混合正态结构可加回归模型。该方法的一个显著特点是能够通过吉布斯采样高效进行后验模拟,因为所有模型参数均具有封闭形式全条件分布。仿真研究表明,我们的方法在各种具有挑战性的场景下成功恢复了真实条件密度及其他回归函数。本文还提供了毒理学、疾病诊断和农业研究的应用案例,进一步验证了所提出建模框架的广泛适用性。实现该方法的R软件包DDPstar已在https://bitbucket.org/mxrodriguez/ddpstar上公开发布。