As a robot senses and selects actions, the world keeps changing. This inference delay creates a gap of tens to hundreds of milliseconds between the observed state and the state at execution. In this work, we take the natural generalization from zero delay to measured delay during training and inference. We introduce Delay-Aware Diffusion Policy (DA-DP), a framework for explicitly incorporating inference delays into policy learning. DA-DP corrects zero-delay trajectories to their delay-compensated counterparts, and augments the policy with delay conditioning. We empirically validate DA-DP on a variety of tasks, robots, and delays and find its success rate more robust to delay than delay-unaware methods. DA-DP is architecture agnostic and transfers beyond diffusion policies, offering a general pattern for delay-aware imitation learning. More broadly, DA-DP encourages evaluation protocols that report performance as a function of measured latency, not just task difficulty.
翻译:当机器人感知环境并选择动作时,世界始终在变化。这种推理延迟会导致观测状态与执行状态之间存在数十至数百毫秒的间隙。在本工作中,我们采取了从零延迟到训练和推理过程中可测量延迟的自然泛化方法。我们提出延迟感知扩散策略(DA-DP),这是一种将推理延迟显式纳入策略学习的框架。DA-DP将零延迟轨迹校正为其延迟补偿对应轨迹,并通过延迟条件增强策略。我们在多种任务、机器人平台和延迟设置下对DA-DP进行了实证验证,发现其成功率相比未考虑延迟的方法对延迟具有更强的鲁棒性。DA-DP具有架构无关性且可迁移至扩散策略之外,为延迟感知模仿学习提供了通用范式。更广泛而言,DA-DP鼓励采用将性能作为可测量延迟函数(而不仅是任务难度)进行报告的评估协议。