Diffusion policies have achieved remarkable success in robotic manipulation, yet they often fail to satisfy strict physical constraints required for safe deployment. Existing approaches impose safety either prematurely during training or reactively via external guardrails at test time, limiting policy expressivity and overall scalability. We propose Physical safety Alignment for Constrained Trajectories (PACT), a self-evolving post-training framework that projects pretrained diffusion policies onto constraint-feasible regions without accessing demonstration data or task rewards. PACT distills constraint gradients into the diffusion model through a reverse-KL objective with dense supervision across timesteps. It incorporates a curriculum that progressively tightens constraints while maintaining theoretically bounded policy shift and monotone improvement, mitigating the safety-performance trade-off from catastrophic forgetting. On simulated and real-world embodied manipulation benchmarks, PACT significantly reduces safety violations by 31.0% on average while improving task success by 30.7%.
翻译:扩散策略在机器人操作中取得了显著成功,但往往无法满足安全部署所需的严格物理约束。现有方法要么在训练阶段过早施加安全性约束,要么在测试阶段通过外部防护机制进行被动干预,这限制了策略的表达能力和整体可扩展性。我们提出面向约束轨迹的物理安全性对齐框架(PACT),这是一种自我进化的训练后优化框架,可在无需访问演示数据或任务奖励的情况下,将预训练扩散策略投影至约束可行区域。PACT通过跨时间步密集监督的逆向KL散度目标,将约束梯度蒸馏至扩散模型中。该框架采用课程学习机制逐步收紧约束,同时保持理论上有界的策略偏移与单调改进,从而缓解由灾难性遗忘引起的安全-性能权衡问题。在仿真及真实世界的具身操作基准测试中,PACT平均降低了31.0%的安全违规事件,同时将任务成功率提升了30.7%。