Autonomous intelligent agents must bridge computational challenges at disparate levels of abstraction, from the low-level spaces of sensory input and motor commands to the high-level domain of abstract reasoning and planning. A key question in designing such agents is how best to instantiate the representational space that will interface between these two levels -- ideally without requiring supervision in the form of expensive data annotations. These objectives can be efficiently achieved by representing the world in terms of objects (grounded in perception and action). In this work, we present a novel, brain-inspired, deep-learning architecture that learns from pixels to interpret, control, and reason about its environment, using object-centric representations. We show the utility of our approach through tasks in synthetic environments that require a combination of (high-level) logical reasoning and (low-level) continuous control. Results show that the agent can learn emergent conditional behavioural reasoning, such as $(A \to B) \land (\neg A \to C)$, as well as logical composition $(A \to B) \land (A \to C) \vdash A \to (B \land C)$ and XOR operations, and successfully controls its environment to satisfy objectives deduced from these logical rules. The agent can adapt online to unexpected changes in its environment and is robust to mild violations of its world model, thanks to dynamic internal desired goal generation. While the present results are limited to synthetic settings (2D and 3D activated versions of dSprites), which fall short of real-world levels of complexity, the proposed architecture shows how to manipulate grounded object representations, as a key inductive bias for unsupervised learning, to enable behavioral reasoning.
翻译:自主智能体必须跨越不同抽象层次的计算挑战,从低层次的感知输入与运动指令空间,到高层次的抽象推理与规划领域。设计此类智能体的一个核心问题在于如何最优地实例化连接这两个层次的表征空间——理想情况下无需依赖昂贵数据标注形式的监督。通过以物体(基于感知与动作建立)的形式表征世界,这些目标能够被高效实现。本研究提出一种新颖的、受大脑启发的深度学习架构,该架构从像素中学习,利用以物体为中心的表征来解读、控制并推理其环境。我们通过在合成环境中需要结合(高层次)逻辑推理与(低层次)连续控制的任务,展示了本方法的实用性。结果表明,智能体能够学习涌现的条件行为推理,例如 $(A \to B) \land (\neg A \to C)$,以及逻辑组合 $(A \to B) \land (A \to C) \vdash A \to (B \land C)$ 和异或运算,并能成功控制其环境以满足从这些逻辑规则推导出的目标。得益于动态的内部期望目标生成机制,智能体能够在线适应其环境中的意外变化,并对世界模型的轻微违反具有鲁棒性。尽管当前结果局限于合成场景(dSprites 的 2D 和 3D 激活版本),尚未达到现实世界的复杂程度,但所提出的架构展示了如何操作基于感知的物体表征——作为无监督学习的关键归纳偏置——以实现行为推理。