Structured additive distributional regression models offer a versatile framework for estimating complete conditional distributions by relating all parameters of a parametric distribution to covariates. Although these models efficiently leverage information in vast and intricate data sets, they often result in highly-parameterized models with many unknowns. Standard estimation methods, like Bayesian approaches based on Markov chain Monte Carlo methods, face challenges in estimating these models due to their complexity and costliness. To overcome these issues, we suggest a fast and scalable alternative based on variational inference. Our approach combines a parsimonious parametric approximation for the posteriors of regression coefficients, with the exact conditional posterior for hyperparameters. For optimization, we use a stochastic gradient ascent method combined with an efficient strategy to reduce the variance of estimators. We provide theoretical properties and investigate global and local annealing to enhance robustness, particularly against data outliers. Our implementation is very general, allowing us to include various functional effects like penalized splines or complex tensor product interactions. In a simulation study, we demonstrate the efficacy of our approach in terms of accuracy and computation time. Lastly, we present two real examples illustrating the modeling of infectious COVID-19 outbreaks and outlier detection in brain activity.
翻译:结构化可加分布回归模型通过将参数分布的所有参数与协变量相关联,为估计完整条件分布提供了一个通用框架。尽管这些模型能高效利用海量复杂数据集中的信息,但它们往往产生包含大量未知参数的高度参数化模型。基于马尔可夫链蒙特卡洛方法的贝叶斯方法等标准估计方法,由于这些模型的复杂性和高昂计算成本而面临挑战。为解决这些问题,我们提出了一种基于变分推断的快速可扩展替代方案。该方法将回归系数后验的简约参数近似与超参数的精确条件后验相结合。在优化过程中,我们采用随机梯度上升法,并辅以降低估计量方差的高效策略。我们给出了理论性质,并研究了全局和局部退火以增强对数据异常值的鲁棒性。本实现具有高度通用性,可纳入多种函数效应,如惩罚样条或复杂张量积交互项。仿真研究验证了本方法在精度和计算时间方面的有效性。最后,我们通过两个实际案例展示了在传染性COVID-19疫情建模和大脑活动异常检测中的应用。