A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise. This assumption is limiting; an agent may encounter settings that dramatically alter the impact of actions: a move ahead action on a wet floor may send the agent twice as far as it expects and using the same action with a broken wheel might transform the expected translation into a rotation. Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation tasks in the AI2-THOR and Habitat environments and show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space. Moreover, we observe significant improvement in robustness against these actions when evaluating in real-world scenarios.
翻译:训练具身智能体时的一个常见假设是:执行动作的影响是稳定的。例如,执行"向前移动"动作总会使智能体前进固定距离,即便可能存在少量执行器噪声。这一假设具有局限性——智能体可能遇到会显著改变动作影响的场景:在湿滑地面上执行前进动作可能使智能体移动预期距离的两倍,而使用同一动作时若轮子损坏,则可能将预期的平移转变为旋转。我们提出不依赖于动作影响稳定反映其预设语义含义,而是利用潜在嵌入对动作影响进行实时建模。通过将这些潜在动作嵌入与基于Transformer的新型策略头部相结合,我们设计了动作自适应策略(AAP)。我们在AI2-THOR和Habitat环境中针对两项具有挑战性的视觉导航任务评估了AAP,结果表明即使推理时遇到缺失动作及先前未见的扰动动作空间,我们的AAP仍具有高性能表现。此外,在真实场景评估中,我们观察到这些动作处理的鲁棒性显著提升。