We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.
翻译:本文提出DeFlow,一种基于流匹配的解耦式离线强化学习框架,能够准确捕捉复杂的行为流形。生成式策略的优化计算成本极高,通常需要借助常微分方程求解器进行反向传播。为解决此问题,我们在流形显式数据驱动置信域内学习轻量级优化模块,而非通过单步蒸馏牺牲迭代生成能力。该方法绕过了求解器微分需求,消除了损失项平衡的必要性,在完整保持流模型迭代表达能力的同时实现稳定改进。实验表明,DeFlow在具有挑战性的OGBench基准测试中取得最优性能,并展现出高效的离线到在线适应能力。