Observational learning requires an agent to learn to perform a task by referencing only observations of the performed task. This work investigates the equivalent setting in real-world robot learning where access to hand-designed rewards and demonstrator actions are not assumed. To address this data-constrained setting, this work presents a planning-based Inverse Reinforcement Learning (IRL) algorithm for world modeling from observation and interaction alone. Experiments conducted entirely in the real-world demonstrate that this paradigm is effective for learning image-based manipulation tasks from scratch in under an hour, without assuming prior knowledge, pre-training, or data of any kind beyond task observations. Moreover, this work demonstrates that the learned world model representation is capable of online transfer learning in the real-world from scratch. In comparison to existing approaches, including IRL, RL, and Behavior Cloning (BC), which have more restrictive assumptions, the proposed approach demonstrates significantly greater sample efficiency and success rates, enabling a practical path forward for online world modeling and planning from observation and interaction. Videos and more at: https://uwrobotlearning.github.io/mpail2/.
翻译:观察学习要求智能体仅通过参考已执行任务的观察来学习执行该任务。本研究探讨了现实世界机器人学习中的等效场景,其中不假设存在手工设计的奖励或演示者动作。为应对这种数据受限的场景,本研究提出了一种基于规划的逆向强化学习算法,该算法仅通过观察与交互进行世界建模。完全在现实世界中进行的实验表明,该范式能够在一小时内从零开始学习基于图像的操控任务,且无需假设任何先验知识、预训练或除任务观察外的任何数据。此外,本研究证明所学习的世界模型表征能够在现实世界中实现从零开始的在线迁移学习。与包括逆向强化学习、强化学习和行为克隆在内的现有方法相比——这些方法具有更严格的假设——所提出的方法展现出显著更高的样本效率和成功率,为基于观察与交互的在线世界建模与规划提供了一条可行的技术路径。视频及更多内容请访问:https://uwrobotlearning.github.io/mpail2/。