In structured additive distributional regression, the conditional distribution of the response variables given the covariate information and the vector of model parameters is modelled using a P-parametric probability density function where each parameter is modelled through a linear predictor and a bijective response function that maps the domain of the predictor into the domain of the parameter. We present a method to perform inference in structured additive distributional regression using stochastic variational inference. We propose two strategies for constructing a multivariate Gaussian variational distribution to estimate the posterior distribution of the regression coefficients. The first strategy leverages covariate information and hyperparameters to learn both the location vector and the precision matrix. The second strategy tackles the complexity challenges of the first by initially assuming independence among all smooth terms and then introducing correlations through an additional set of variational parameters. Furthermore, we present two approaches for estimating the smoothing parameters. The first treats them as free parameters and provides point estimates, while the second accounts for uncertainty by applying a variational approximation to the posterior distribution. Our model was benchmarked against state-of-the-art competitors in logistic and gamma regression simulation studies. Finally, we validated our approach by comparing its posterior estimates to those obtained using Markov Chain Monte Carlo on a dataset of patents from the biotechnology/pharmaceutics and semiconductor/computer sectors.
翻译:在结构化可加分布回归中,响应变量在给定协变量信息和模型参数向量条件下的条件分布,通过一个P参数概率密度函数进行建模,其中每个参数均通过线性预测项及将预测项定义域映射至参数定义域的双射响应函数进行建模。本文提出一种基于随机变分推断的结构化可加分布回归推断方法。我们提出两种构建多元高斯变分分布的策略以估计回归系数的后验分布:第一种策略利用协变量信息和超参数同时学习位置向量和精度矩阵;第二种策略通过先假设所有平滑项相互独立,再引入额外变分参数建立相关性的方式,解决第一种策略的复杂度挑战。此外,我们提出两种平滑参数估计方法:第一种将其视为自由参数并提供点估计,第二种则通过后验分布的变分近似来量化其不确定性。在逻辑回归和伽马回归的模拟研究中,我们的模型与前沿方法进行了基准测试。最后,通过在生物技术/制药和半导体/计算机领域的专利数据集上,将本方法的后验估计与马尔可夫链蒙特卡洛方法所得结果进行对比,验证了所提方法的有效性。