With the recently increased interest in probabilistic models, the efficiency of an underlying sampler becomes a crucial consideration. Hamiltonian Monte Carlo (HMC) is one popular option for models of this kind. Performance of the method, however, strongly relies on a choice of parameters associated with an integration for Hamiltonian equations. Up to date, such a choice remains mainly heuristic or introduces time complexity. We propose a novel computationally inexpensive and flexible approach (we call it Adaptive Tuning or ATune) that, by combining a theoretical analysis of the multivariate Gaussian model with simulation data generated during a burn-in stage of a HMC simulation, detects a system specific splitting integrator with a set of reliable sampler's hyperparameters, including their credible randomization intervals, to be readily used in a production simulation. The method automatically eliminates those values of simulation parameters which could cause undesired extreme scenarios, such as resonance artifacts, low accuracy or poor sampling. The new approach is implemented in the in-house software package HaiCS, with no computational overheads introduced in a production simulation, and can be easily incorporated in any package for Bayesian inference with HMC. The tests on popular statistical models reveal the superiority of adaptively tuned standard and generalized HMC methods in terms of stability, performance and accuracy over conventional HMC tuned heuristically and coupled with the well-established integrators. We also claim that the generalized HMC is preferable for achieving high sampling performance. The efficiency of the new methodology is assessed in comparison with state-of-the-art samplers, e.g. NUTS, in real-world applications, such as endocrine therapy resistance in cancer, modeling of cell-cell adhesion dynamics and influenza A epidemic outbreak.
翻译:随着近年来对概率模型的关注度日益增长,底层采样器的效率成为关键考量因素。哈密顿蒙特卡洛方法是此类模型中的热门选择之一。然而,该方法的性能高度依赖于与哈密顿方程积分相关的参数选择。迄今为止,这类参数的选择主要依赖启发式方法,或会引入时间复杂性。我们提出一种新颖的计算成本低且灵活的方法(称为自适应调优),该方法通过将多变量高斯模型的理论分析与哈密顿蒙特卡洛模拟预热阶段生成的仿真数据相结合,能够检测系统特定的分裂积分器,并附带一组可靠的采样器超参数(包括其可信随机化区间),以便直接用于生产模拟。该方法自动排除可能导致不期望极端场景(如共振伪影、低精度或采样性能差)的模拟参数值。这一新方法已在我们内部软件包HaiCS中实现,且在生产模拟中不引入任何计算开销,可轻松集成到任何基于哈密顿蒙特卡洛的贝叶斯推断软件包中。在常见统计模型上的测试表明,无论是标准还是广义的哈密顿蒙特卡洛方法,经自适应调优后在稳定性、性能和精度方面均优于传统启发式调优并结合成熟积分器的哈密顿蒙特卡洛方法。我们还主张,广义哈密顿蒙特卡洛方法在实现高采样性能方面更具优势。通过与当前最先进采样器(如NUTS)在真实应用场景(如癌症内分泌治疗耐药性、细胞间粘附动力学建模及甲型流感疫情暴发)中的比较,评估了新方法的效率。