Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Differential Equation (SDE). However, the same score-based SDE modeling that grants diffusion policies the flexibility to learn diverse behavior also incurs solver and score-matching errors, large data requirements, and inconsistencies in action generation. While less critical in image generation, these inaccuracies compound and lead to failure in continuous control settings. We introduce Contractive Diffusion Policies (CDPs) to induce contractive behavior in the diffusion sampling dynamics. Contraction pulls nearby flows closer to enhance robustness against solver and score-matching errors while reducing unwanted action variance. We develop an in-depth theoretical analysis along with a practical implementation recipe to incorporate CDPs into existing diffusion policy architectures with minimal modification and computational cost. We evaluate CDPs for offline learning by conducting extensive experiments in simulation and real-world settings. Across benchmarks, CDPs often outperform baseline policies, with pronounced benefits under data scarcity.
翻译:扩散策略已成为离线策略学习中强大的生成模型,其采样过程可通过引导随机微分方程(SDE)的得分函数进行严格刻画。然而,赋予扩散策略学习多样化行为灵活性的得分型SDE建模,同样会引入求解器与得分匹配误差、较高的数据需求以及动作生成的不一致性。这些误差在图像生成中影响较小,但在连续控制任务中会不断累积并导致失败。本文提出收缩扩散策略(CDPs),通过在扩散采样动力学中引入收缩行为。收缩特性使邻近的流形相互靠近,从而增强对求解器及得分匹配误差的鲁棒性,同时降低不必要的动作方差。我们建立了深入的理论分析框架,并提供实用的实现方案,使得CDPs能够以最小的修改和计算成本嵌入现有扩散策略架构。通过在仿真与真实场景中开展大量实验,我们评估了CDPs在离线学习中的性能。在多项基准测试中,CDPs普遍优于基线策略,且在数据稀缺条件下表现出显著优势。