We propose a simulation-free algorithm for the solution of generic problems in stochastic optimal control (SOC). Unlike existing methods, our approach does not require the solution of an adjoint problem, but rather leverages Girsanov theorem to directly calculate the gradient of the SOC objective on-policy. This allows us to speed up the optimization of control policies parameterized by neural networks since it completely avoids the expensive back-propagation step through stochastic differential equations (SDEs) used in the Neural SDE framework. In particular, it enables us to solve SOC problems in high dimension and on long time horizons. We demonstrate the efficiency of our approach in various domains of applications, including standard stochastic optimal control problems, sampling from unnormalized distributions via construction of a Schr\"odinger-F\"ollmer process, and fine-tuning of pre-trained diffusion models. In all cases our method is shown to outperform the existing methods in both the computing time and memory efficiency.
翻译:我们提出了一种用于解决随机最优控制(SOC)通用问题的免仿真算法。与现有方法不同,我们的方法不需要求解伴随问题,而是利用Girsanov定理直接计算策略上的SOC目标梯度。这使得我们能够加速由神经网络参数化的控制策略的优化,因为它完全避免了在神经随机微分方程(SDE)框架中通过随机微分方程进行昂贵的反向传播步骤。特别地,它使我们能够解决高维度和长时间范围的SOC问题。我们在多个应用领域证明了我们方法的效率,包括标准随机最优控制问题、通过构造Schrödinger-Föllmer过程从未归一化分布中采样,以及对预训练扩散模型的微调。在所有案例中,我们的方法在计算时间和内存效率方面均优于现有方法。