In this work, we investigate the large-scale mean-field variational inference (MFVI) problem from a mini-batch primal-dual perspective. By reformulating MFVI as a constrained finite-sum problem, we develop a novel primal-dual algorithm based on an augmented Lagrangian formulation, termed primal-dual variational inference (PD-VI). PD-VI jointly updates global and local variational parameters in the evidence lower bound in a scalable manner. To further account for heterogeneous loss geometry across different variational parameter blocks, we introduce a block-preconditioned extension, P$^2$D-VI, which adapts the primal-dual updates to the geometry of each parameter block and improves both numerical robustness and practical efficiency. We establish convergence guarantees for both PD-VI and P$^2$D-VI under properly chosen constant step size, without relying on conjugacy assumptions or explicit bounded-variance conditions. In particular, we prove $O(1/T)$ convergence to a stationary point in general settings and linear convergence under strong convexity. Numerical experiments on synthetic data and a real large-scale spatial transcriptomics dataset demonstrate that our methods consistently outperform existing stochastic variational inference approaches in terms of convergence speed and solution quality.
翻译:本文从小批量原始-对偶优化的视角研究大规模均值场变分推断问题。通过将均值场变分推断重构为带约束的有限和问题,我们基于增广拉格朗日形式提出了一种新型原始-对偶算法——原始-对偶变分推断。该方法以可扩展的方式联合更新证据下界中的全局与局部变分参数。为进一步适应不同变分参数块之间的异质损失几何特性,我们提出了块预条件扩展算法P$^2$D-VI,该算法使原始-对偶更新过程适应各参数块的几何结构,从而提升数值鲁棒性与实际计算效率。在适当选取恒定步长的条件下,我们为PD-VI与P$^2$D-VI建立了收敛性保证,且无需依赖共轭性假设或显式有界方差条件。特别地,我们证明了在一般设定下算法以$O(1/T)$速率收敛至稳定点,在强凸条件下则具有线性收敛性。在合成数据与真实大规模空间转录组数据集上的数值实验表明,本方法在收敛速度与解的质量方面均持续优于现有的随机变分推断方法。