Stochastic optimal control (SOC) aims to direct the behavior of noisy systems and has widespread applications in science, engineering, and artificial intelligence. In particular, reward fine-tuning of diffusion and flow matching models and sampling from unnormalized methods can be recast as SOC problems. A recent work has introduced Adjoint Matching (Domingo-Enrich et al., 2024), a loss function for SOC problems that vastly outperforms existing loss functions in the reward fine-tuning setup. The goal of this work is to clarify the connections between all the existing (and some new) SOC loss functions. Namely, we show that SOC loss functions can be grouped into classes that share the same gradient in expectation, which means that their optimization landscape is the same; they only differ in their gradient variance. We perform simple SOC experiments to understand the strengths and weaknesses of different loss functions.
翻译:随机最优控制(SOC)旨在引导噪声系统的行为,在科学、工程和人工智能领域具有广泛应用。特别地,扩散模型与流匹配模型的奖励微调以及非归一化方法的采样问题均可重构为SOC问题。近期研究提出的伴随匹配损失函数(Domingo-Enrich等人,2024)在奖励微调场景中显著优于现有损失函数。本文旨在阐明现有(及若干新提出的)SOC损失函数之间的关联。具体而言,我们证明SOC损失函数可按期望梯度相同的原则进行分类,这意味着它们具有相同的优化空间,仅存在梯度方差差异。通过基础SOC实验,我们系统分析了不同损失函数的优势与局限。