Hierarchical Bayesian models based on Gaussian processes are considered useful for describing complex nonlinear statistical dependencies among variables in real-world data. However, effective Monte Carlo algorithms for inference with these models have not yet been established, except for several simple cases. In this study, we show that, compared with the slow inference achieved with existing program libraries, the performance of Riemannian-manifold Hamiltonian Monte Carlo (RMHMC) can be drastically improved by optimising the computation order according to the model structure and dynamically programming the eigendecomposition. This improvement cannot be achieved when using an existing library based on a naive automatic differentiator. We numerically demonstrate that RMHMC effectively samples from the posterior, allowing the calculation of model evidence, in a Bayesian logistic regression on simulated data and in the estimation of propensity functions for the American national medical expenditure data using several Bayesian multiple-kernel models. These results lay a foundation for implementing effective Monte Carlo algorithms for analysing real-world data with Gaussian processes, and highlight the need to develop a customisable library set that allows users to incorporate dynamically programmed objects and finely optimises the mode of automatic differentiation depending on the model structure.
翻译:基于高斯过程的分层贝叶斯模型被认为适用于描述现实世界数据中变量间复杂的非线性统计依赖关系。然而,除若干简单情形外,针对此类模型的推断尚未建立有效的蒙特卡洛算法。本研究表明,相较于现有程序库实现的缓慢推断,通过依据模型结构优化计算顺序并动态规划特征分解,黎曼流形哈密顿蒙特卡洛(RMHMC)的性能可获得显著提升。这种改进在使用基于朴素自动微分器的现有库时无法实现。我们通过数值实验证明,在模拟数据的贝叶斯逻辑回归中,以及在使用多个贝叶斯多核模型对美国全国医疗支出数据的倾向函数估计中,RMHMC能有效从后验分布采样,从而支持模型证据的计算。这些结果为实施针对高斯过程现实数据分析的有效蒙特卡洛算法奠定了基础,并凸显了开发可定制库集的必要性——该库集应允许用户整合动态规划对象,并能依据模型结构精细优化自动微分模式。