The rare-event sampling problem has long been the central limiting factor in molecular dynamics (MD), especially in biomolecular simulation. Recently, diffusion models such as BioEmu have emerged as powerful equilibrium samplers that generate independent samples from complex molecular distributions, eliminating the cost of sampling rare transition events. However, a sampling problem remains when computing observables that rely on states which are rare in equilibrium, for example folding free energies. Here, we introduce enhanced diffusion sampling, enabling efficient exploration of rare-event regions while preserving unbiased thermodynamic estimators. The key idea is to perform quantitatively accurate steering protocols to generate biased ensembles and subsequently recover equilibrium statistics via exact reweighting. We instantiate our framework in three algorithms: UmbrellaDiff (umbrella sampling with diffusion models), $Δ$G-Diff (free-energy differences via tilted ensembles), and MetaDiff (a batchwise analogue for metadynamics). Across toy systems, protein folding landscapes and folding free energies, our methods achieve fast, accurate, and scalable estimation of equilibrium properties within GPU-minutes to hours per system -- closing the rare-event sampling gap that remained after the advent of diffusion-model equilibrium samplers.
翻译:稀有事件采样问题长期以来一直是分子动力学(MD)领域的核心瓶颈,尤其在生物分子模拟中更为突出。近期,诸如BioEmu等扩散模型已成为强大的平衡态采样器,能够从复杂分子分布中生成独立样本,从而消除了稀有跃迁事件采样的计算成本。然而,在计算依赖于平衡态下稀有状态的观测值时(例如折叠自由能),采样问题依然存在。本文提出增强扩散采样方法,在保持无偏热力学估计量的同时,实现对稀有事件区域的高效探索。其核心思想是通过定量精确的引导协议生成偏置系综,并随后通过精确重加权恢复平衡态统计量。我们将该框架具体实现为三种算法:UmbrellaDiff(基于扩散模型的伞形采样)、$Δ$G-Diff(通过倾斜系综计算自由能差)以及MetaDiff(元动力学的批处理类比算法)。在玩具系统、蛋白质折叠能垒和折叠自由能等场景中,我们的方法能在每系统GPU分钟至小时量级内实现快速、准确且可扩展的平衡态性质估计——这弥补了扩散模型平衡态采样器出现后依然存在的稀有事件采样空白。