Understanding the world in terms of objects and the possible interplays with them is an important cognition ability, especially in robotics manipulation, where many tasks require robot-object interactions. However, learning such a structured world model, which specifically captures entities and relationships, remains a challenging and underexplored problem. To address this, we propose FOCUS, a model-based agent that learns an object-centric world model. Thanks to a novel exploration bonus that stems from the object-centric representation, FOCUS can be deployed on robotics manipulation tasks to explore object interactions more easily. Evaluating our approach on manipulation tasks across different settings, we show that object-centric world models allow the agent to solve tasks more efficiently and enable consistent exploration of robot-object interactions. Using a Franka Emika robot arm, we also showcase how FOCUS could be adopted in real-world settings.
翻译:摘要:从对象及其可能交互的角度理解世界是一种重要的认知能力,尤其在机器人操作中,许多任务需要机器人与对象之间的交互。然而,学习这种能够精确捕捉实体与关系的结构化世界模型,仍是一个具有挑战性且尚未充分探索的问题。为此,我们提出FOCUS——一种基于模型的智能体,它能够学习一种以对象为中心的世界模型。借助一种源于对象中心表征的新型探索奖励机制,FOCUS可被部署于机器人操作任务中,更轻松地探索对象交互。通过在多种场景下的操作任务中评估我们的方法,我们证明:以对象为中心的世界模型能让智能体更高效地完成任务,并实现对机器人与对象交互的一致探索。此外,利用Franka Emika机器人手臂,我们还展示了FOCUS在实际场景中的应用潜力。