Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.
翻译:以对象及其可能交互方式理解世界是一项重要的认知能力,尤其在机器人操作领域,许多任务涉及机器人与对象的交互。然而,学习这种能够明确捕捉实体与关系的结构化世界模型仍是一个具有挑战性且未被充分探索的问题。为此,我们提出FOCUS,一种基于模型的智能体,能够学习对象中心的世界模型。得益于源自对象中心表征的新型探索奖励,FOCUS可被部署于机器人操作任务,从而更轻松地探索对象交互。通过在不同设置下的操作任务中评估我们的方法,我们证明对象中心世界模型能让智能体更高效地完成任务,并实现机器人与对象交互的一致探索。利用Franka Emika机械臂,我们还展示了FOCUS在现实场景中的应用潜力。