Particle-MALA and Particle-mGRAD: Gradient-based MCMC methods for high-dimensional state-space models

State-of-the-art methods for Bayesian inference in state-space models are (a) conditional sequential Monte Carlo (CSMC) algorithms; (b) sophisticated 'classical' MCMC algorithms like MALA, or mGRAD from Titsias and Papaspiliopoulos (2018, arXiv:1610.09641v3 [stat.ML]). The former propose $N$ particles at each time step to exploit the model's 'decorrelation-over-time' property and thus scale favourably with the time horizon, $T$ , but break down if the dimension of the latent states, $D$, is large. The latter leverage gradient-/prior-informed local proposals to scale favourably with $D$ but exhibit sub-optimal scalability with $T$ due to a lack of model-structure exploitation. We introduce methods which combine the strengths of both approaches. The first, Particle-MALA, spreads $N$ particles locally around the current state using gradient information, thus extending MALA to $T > 1$ time steps and $N > 1$ proposals. The second, Particle-mGRAD, additionally incorporates (conditionally) Gaussian prior dynamics into the proposal, thus extending the mGRAD algorithm to $T > 1$ time steps and $N > 1$ proposals. We prove that Particle-mGRAD interpolates between CSMC and Particle-MALA, resolving the 'tuning problem' of choosing between CSMC (superior for highly informative prior dynamics) and Particle-MALA (superior for weakly informative prior dynamics). We similarly extend other 'classical' MCMC approaches like auxiliary MALA, aGRAD, and preconditioned Crank-Nicolson-Langevin (PCNL) to $T > 1$ time steps and $N > 1$ proposals. In experiments, for both highly and weakly informative prior dynamics, our methods substantially improve upon both CSMC and sophisticated 'classical' MCMC approaches.

翻译：当前贝叶斯推断中处理状态空间模型的前沿方法包括：(a) 条件序贯蒙特卡洛(CSMC)算法；(b) Titsias与Papaspiliopoulos (2018, arXiv:1610.09641v3 [stat.ML])提出的MALA或mGRAD等经典MCMC算法。前者在每个时间步提出N个粒子以利用模型"随时间的去相关"特性，从而在时间跨度T上具有良好可扩展性，但当潜在状态维度D较大时性能退化。后者则利用梯度/先验驱动的局部提议，在D维度上展现良好可扩展性，但由于缺乏模型结构利用，在T维度上呈现次优扩展性。本文提出融合两类方法优势的新算法。第一种算法Particle-MALA利用梯度信息在当前状态周围局部散布N个粒子，从而将MALA扩展到T>1时间步与N>1提议。第二种算法Particle-mGRAD进一步将(条件)高斯先验动力学融入提议，从而将mGRAD算法扩展到T>1时间步与N>1提议。我们证明Particle-mGRAD在CSMC与Particle-MALA之间形成插值，解决了在CSMC（适用于强信息先验动力学）与Particle-MALA（适用于弱信息先验动力学）之间的"调节问题"。类似地，我们将辅助MALA、aGRAD及预条件Crank-Nicolson-Langevin (PCNL)等其他"经典"MCMC方法扩展到T>1时间步与N>1提议。实验结果表明，在强信息与弱信息先验动力学两种场景下，本文方法均显著优于CSMC和先进"经典"MCMC方法。