Rare Event Analysis via Stochastic Optimal Control

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

翻译：稀罕事件，例如生物分子中的构象变化、相变和化学反应，是许多物理系统行为的核心，但由于无偏模拟极少产生这些事件，因此通过计算对其进行研究极其困难。过渡路径理论（TPT）为分析此类事件提供了一个严格的统计框架：它表征了在两个指定亚稳态（反应物和产物）之间反应轨迹的集合，其核心对象——承诺函数（给出了系统下一次到达产物而非反应物的概率）——编码了所有关键的动力学和热力学信息。我们引入了一个框架，将承诺函数估计视为随机最优控制（SOC）问题。在此公式中，承诺函数定义了一个反馈控制——与其对数梯度成正比——该控制主动将轨迹引导向反应区域，从而实现对反应路径的高效采样。为解决由此产生的停时控制问题，我们开发了两个互补的目标函数：直接反向传播损失和具有原则性的离策略值匹配损失，并为其建立了最优性的一阶保证。我们进一步解决了亚稳态问题，该问题可能将受控轨迹困在中间势阱中，为此引入了一种替代采样过程，该过程在降低有效能垒的同时保持反应电流。在基准系统上，该框架给出的承诺函数估计、反应速率和平衡常数显著优于现有方法。