Within Bayesian nonparametrics, dependent Dirichlet process mixture models provide a highly flexible approach for conducting inference about the conditional density function. However, several formulations of this class make either rather restrictive modelling assumptions or involve intricate algorithms for posterior inference, thus preventing their widespread use. In response to these challenges, we present a flexible, versatile, and computationally tractable model for density regression based on a single-weights dependent Dirichlet process mixture of normal distributions model for univariate continuous responses. We assume an additive structure for the mean of each mixture component and incorporate the effects of continuous covariates through smooth nonlinear functions. The key components of our modelling approach are penalised B-splines and their bivariate tensor product extension. Our proposed method also seamlessly accommodates parametric effects of categorical covariates, linear effects of continuous covariates, interactions between categorical and/or continuous covariates, varying coefficient terms, and random effects, which is why we refer our model as a Dirichlet process mixture of normal structured additive regression models. A noteworthy feature of our method is its efficiency in posterior simulation through Gibbs sampling, as closed-form full conditional distributions for all model parameters are available. Results from a simulation study demonstrate that our approach successfully recovers true conditional densities and other regression functionals in various challenging scenarios. Applications to a toxicology, disease diagnosis, and agricultural study are provided and further underpin the broad applicability of our modelling framework. An R package, \texttt{DDPstar}, implementing the proposed method is publicly available at \url{https://bitbucket.org/mxrodriguez/ddpstar}.
翻译:在贝叶斯非参数框架下,依赖狄利克雷过程混合模型为条件密度函数的推断提供了高度灵活的方法。然而,该类模型的若干公式化方法要么施加了较为严格的建模假设,要么涉及复杂的后验推断算法,从而阻碍了其广泛应用。针对这些挑战,我们提出了一种灵活、通用且计算易处理的条件密度回归模型,该模型基于单权重依赖狄利克雷过程混合正态分布模型,适用于单变量连续响应变量。我们假设每个混合成分的均值具有可加结构,并通过光滑非线性函数纳入连续协变量的影响。我们建模方法的关键组成部分是惩罚B样条及其双变量张量积扩展。所提出的方法还能无缝处理分类协变量的参数效应、连续协变量的线性效应、分类和/或连续协变量间的交互作用、变系数项以及随机效应,因此我们将模型称为狄利克雷过程混合正态结构可加回归模型。本方法的一个显著特点是通过吉布斯采样实现后验模拟的高效性,因为所有模型参数均可获得封闭形式的全条件分布。模拟研究结果表明,我们的方法在各种具有挑战性的场景下能成功恢复真实条件密度及其他回归泛函。我们提供了毒理学、疾病诊断和农业研究的应用案例,进一步验证了建模框架的广泛适用性。一个实现所提出方法的R包\verb|DDPstar|可在\url{https://bitbucket.org/mxrodriguez/ddpstar}公开获取。