Existing imitation learning methods decouple perception and action, which overlooks the causal reciprocity between sensory representations and action execution that humans naturally leverage for adaptive behaviors. To bridge this gap, we introduce Action-Guided Diffusion Policy (DP-AG), a unified representation learning that explicitly models a dynamic interplay between perception and action through probabilistic latent dynamics. DP-AG encodes latent observations into a Gaussian posterior via variational inference and evolves them using an action-guided SDE, where the Vector-Jacobian Product (VJP) of the diffusion policy's noise predictions serves as a structured stochastic force driving latent updates. To promote bidirectional learning between perception and action, we introduce a cycle-consistent contrastive loss that organizes the gradient flow of the noise predictor into a coherent perception-action loop, enforcing mutually consistent transitions in both latent updates and action refinements. Theoretically, we derive a variational lower bound for the action-guided SDE, and prove that the contrastive objective enhances continuity in both latent and action trajectories. Empirically, DP-AG significantly outperforms state-of-the-art methods across simulation benchmarks and real-world UR5 manipulation tasks. As a result, our DP-AG offers a promising step toward bridging biological adaptability and artificial policy learning.
翻译:现有模仿学习方法将感知与行动解耦,忽视了人类自然利用的感官表征与动作执行之间的因果互惠性,从而限制了自适应行为的学习。为弥补这一差距,我们提出了动作引导扩散策略(DP-AG),这是一种统一的表征学习方法,通过概率潜在动力学显式建模感知与行动之间的动态交互。DP-AG通过变分推理将潜在观测编码为高斯后验分布,并利用动作引导的随机微分方程(SDE)对其进行演化,其中扩散策略噪声预测的向量-雅可比积(VJP)作为驱动潜在更新的结构化随机力。为促进感知与行动之间的双向学习,我们引入了一种循环一致性对比损失,将噪声预测器的梯度流组织成一个连贯的感知-行动循环,从而在潜在更新和动作优化中强制实现相互一致的转移。理论上,我们推导了动作引导SDE的变分下界,并证明了对比目标增强了潜在轨迹与动作轨迹的连续性。实验表明,DP-AG在仿真基准测试和真实世界UR5机械臂操作任务中均显著优于现有最先进方法。因此,我们的DP-AG为连接生物适应性与人工智能策略学习迈出了有希望的一步。