With the recently increased interest in probabilistic models, the efficiency of an underlying sampler becomes a crucial consideration. Hamiltonian Monte Carlo (HMC) is one popular option for models of this kind. Performance of the method, however, strongly relies on a choice of parameters associated with an integration for Hamiltonian equations. Up to date, such a choice remains mainly heuristic or introduces time complexity. We propose a novel computationally inexpensive and flexible approach (we call it Adaptive Tuning or ATune) that, by combining a theoretical analysis of the multivariate Gaussian model with simulation data generated during a burn-in stage of a HMC simulation, detects a system specific splitting integrator with a set of reliable sampler's hyperparameters, including their credible randomization intervals, to be readily used in a production simulation. The method automatically eliminates those values of simulation parameters which could cause undesired extreme scenarios, such as resonance artifacts, low accuracy or poor sampling. The new approach is implemented in the in-house software package HaiCS, with no computational overheads introduced in a production simulation, and can be easily incorporated in any package for Bayesian inference with HMC. The tests on popular statistical models reveal the superiority of adaptively tuned standard and generalized HMC (GHMC) methods in terms of stability, performance and accuracy over conventional HMC tuned heuristically and coupled with the well-established integrators. We also claim that GHMC is preferable for achieving high sampling performance. The efficiency of the new methodology is assessed in comparison with state-of-the-art samplers, e.g. NUTS, in real-world applications, such as endocrine therapy resistance in cancer, modeling of cell-cell adhesion dynamics and influenza A epidemic outbreak.
翻译:随着概率模型日益受到关注,底层采样器的效率成为关键考量因素。哈密顿蒙特卡洛(HMC)是此类模型的常用选择。然而,该方法的性能高度依赖于哈密顿方程数值积分相关参数的选择。迄今为止,此类参数选择仍主要依赖启发式方法或会引入时间复杂性。我们提出了一种新颖的计算成本低且灵活的方法(称为自适应调参或ATune),该方法通过结合多元高斯模型的理论分析与HMC模拟预热阶段生成的仿真数据,检测出系统特定的分裂积分器及一组可靠的采样器超参数(包括其可信随机化区间),可直接用于生产模拟。该方法自动剔除可能导致不良极端场景(如共振伪影、低精度或采样效果差)的模拟参数值。新方法已在自研软件包HaiCS中实现,在生产模拟中不会引入额外计算开销,并可轻松集成到任何基于HMC的贝叶斯推断软件包中。在经典统计模型上的测试表明,自适应调参的标准HMC与广义HMC(GHMC)方法在稳定性、性能和精度方面均优于传统启发式调参且耦合成熟积分器的HMC方法。我们还论证了GHMC在实现高采样性能方面的优越性。通过在实际应用场景(如癌症内分泌治疗耐药性、细胞间粘附动力学建模和甲型流感疫情暴发)中与前沿采样器(如NUTS)的对比,评估了新方法的有效性。