Learning Dynamic Attribute-factored World Models for Efficient Multi-object Reinforcement Learning

In many reinforcement learning tasks, the agent has to learn to interact with many objects of different types and generalize to unseen combinations and numbers of objects. Often a task is a composition of previously learned tasks (e.g. block stacking). These are examples of compositional generalization, in which we compose object-centric representations to solve complex tasks. Recent works have shown the benefits of object-factored representations and hierarchical abstractions for improving sample efficiency in these settings. On the other hand, these methods do not fully exploit the benefits of factorization in terms of object attributes. In this paper, we address this opportunity and introduce the Dynamic Attribute FacTored RL (DAFT-RL) framework. In DAFT-RL, we leverage object-centric representation learning to extract objects from visual inputs. We learn to classify them in classes and infer their latent parameters. For each class of object, we learn a class template graph that describes how the dynamics and reward of an object of this class factorize according to its attributes. We also learn an interaction pattern graph that describes how objects of different classes interact with each other at the attribute level. Through these graphs and a dynamic interaction graph that models the interactions between objects, we can learn a policy that can then be directly applied in a new environment by just estimating the interactions and latent parameters. We evaluate DAFT-RL in three benchmark datasets and show our framework outperforms the state-of-the-art in generalizing across unseen objects with varying attributes and latent parameters, as well as in the composition of previously learned tasks.

翻译：在许多强化学习任务中，智能体需要学会与不同类型的大量对象进行交互，并泛化到未见过的对象组合与数量。通常，任务是先前已学习任务的组合（例如，堆叠方块）。这些都是组合泛化的例子，其中我们组合以对象为中心的表示来解决复杂任务。近期研究已展示了以对象分解的表示和层次抽象在提升此类场景样本效率方面的优势。然而，这些方法并未充分利用对象属性维度的分解优势。针对这一空白，本文提出了动态属性分解强化学习（DAFT-RL）框架。在DAFT-RL中，我们利用以对象为中心的表示学习从视觉输入中提取对象，学习将其分类并推断其潜在参数。针对每一类对象，我们学习一个类模板图，用于描述该类对象的动态特性和奖励如何根据其属性进行分解。同时，我们学习一个交互模式图，以描述不同类别的对象如何在属性层面相互交互。通过结合这些图与一个建模对象间交互的动态交互图，我们能够学习一个策略，该策略在新环境中仅需估计交互关系和潜在参数即可直接应用。我们在三个基准数据集上评估了DAFT-RL，结果表明，在泛化至包含不同属性与潜在参数的未见对象以及组合先前学习任务方面，我们的框架超越了现有最先进方法。