Variational empirical Bayes (VEB) methods provide a practically attractive approach to fitting large, sparse, multiple regression models. These methods usually use coordinate ascent to optimize the variational objective function, an approach known as coordinate ascent variational inference (CAVI). Here we propose alternative optimization approaches based on gradient-based (quasi-Newton) methods, which we call gradient-based variational inference (GradVI). GradVI exploits a recent result from Kim et. al. [arXiv:2208.10910] which writes the VEB regression objective function as a penalized regression. Unfortunately the penalty function is not available in closed form, and we present and compare two approaches to dealing with this problem. In simple situations where CAVI performs well, we show that GradVI produces similar predictive performance, and GradVI converges in fewer iterations when the predictors are highly correlated. Furthermore, unlike CAVI, the key computations in GradVI are simple matrix-vector products, and so GradVI is much faster than CAVI in settings where the design matrix admits fast matrix-vector products (e.g., as we show here, trendfiltering applications) and lends itself to parallelized implementations in ways that CAVI does not. GradVI is also very flexible, and could exploit automatic differentiation to easily implement different prior families. Our methods are implemented in an open-source Python software, GradVI (available from https://github.com/stephenslab/gradvi ).
翻译:变分经验贝叶斯方法为拟合大规模稀疏多元回归模型提供了一种实用且具有吸引力的途径。这类方法通常采用坐标上升法优化变分目标函数,即坐标上升变分推断法。本文提出基于梯度(拟牛顿)法的替代优化方案,称为梯度变分推断法。该方法利用了Kim等人[arXiv:2208.10910]的最新研究成果,将变分经验贝叶斯回归目标函数重写为惩罚回归形式。然而该惩罚函数不存在闭式解,为此我们提出并比较了两种处理该问题的方案。在坐标上升变分推断表现良好的简单场景中,梯度变分推断展现出相当的预测性能,且当预测变量高度相关时能以更少迭代次数收敛。此外,与坐标上升变分推断不同,梯度变分推断的核心计算为简单的矩阵-向量乘积运算,因此在设计矩阵支持快速矩阵-向量乘积的场景中(如本文展示的趋势滤波应用)具有显著的速度优势,并能实现坐标上升变分推断无法支持的并行化计算。梯度变分推断还具有高度灵活性,可通过自动微分技术便捷实现不同先验分布族。本方法已集成于开源Python软件GradVI中(可通过https://github.com/stephenslab/gradvi获取)。