Diffusion models have emerged as powerful tools for planning and control by learning multimodal distributions over actions and trajectories. Yet reliable inference-time safety enforcement remains a key barrier to their deployment in safety-critical tasks. Existing approaches typically project each denoising iterate onto the feasible set, even though constraints are defined only on the final clean trajectory. Enforcing feasibility on noisy intermediate samples can therefore overconstrain the sampling dynamics, substantially degrading sample quality. To address this limitation, we introduce DiRecT (Diffusion-based planning via Receding-horizon denoising with Terminal constraints), a training-free algorithm for constrained sampling from diffusion models via stochastic optimal control (SOC). DiRecT enforces constraints only on the final clean sample, avoiding unnecessary restrictions on the intermediate denoising dynamics. Inspired by model predictive control, we derive a principled receding-horizon surrogate for the otherwise intractable constrained SOC formulation, yielding an efficient algorithm that cleanly separates stochastic denoising from constraint satisfaction, progressively steering samples toward feasible final trajectories without distorting the learned diffusion dynamics. Furthermore, DiRecT is highly flexible: it can leverage off-the-shelf or domain-specific optimizers, incorporate priors over environment dynamics, and optimize additional soft rewards. Extensive experiments on safe planning benchmarks demonstrate that DiRecT substantially improves deployment safety and task performance over existing diffusion-based planning baselines.
翻译:扩散模型通过学习动作和轨迹上的多模态分布,已成为规划与控制的强大工具。然而,在安全关键任务中,可靠的推理时安全性强制执行仍是其部署的主要障碍。现有方法通常将每次去噪迭代投影到可行集上,但约束仅定义于最终干净轨迹。因此,对含噪中间样本强制执行可行性会过度约束采样动力学,显著降低样本质量。为解决此局限,我们提出DiRecT(基于扩散的规划——通过带终端约束的滚动时域去噪),这是一种通过随机最优控制从扩散模型中进行无训练约束采样的算法。DiRecT仅对最终干净样本施加约束,避免对中间去噪动力学施加不必要的限制。受模型预测控制启发,我们推导出一个原理性的滚动时域替代方案,用于原本难以处理的约束随机最优控制公式,从而得到一种高效算法,该算法将随机去噪与约束满足过程清晰分离,逐步引导样本朝向可行的最终轨迹,同时不扭曲已学习的扩散动力学。此外,DiRecT具有高度灵活性:它可利用现成或领域专用优化器、融合环境动力学先验,并优化额外软奖励。在安全规划基准上的大量实验表明,与现有基于扩散的规划基线相比,DiRecT显著提升了部署安全性与任务性能。