Visual reasoning is essential for building intelligent agents that understand the world and perform problem-solving beyond perception. Differentiable forward reasoning has been developed to integrate reasoning with gradient-based machine learning paradigms. However, due to the memory intensity, most existing approaches do not bring the best of the expressivity of first-order logic, excluding a crucial ability to solve abstract visual reasoning, where agents need to perform reasoning by using analogies on abstract concepts in different scenarios. To overcome this problem, we propose NEUro-symbolic Message-pAssiNg reasoNer (NEUMANN), which is a graph-based differentiable forward reasoner, passing messages in a memory-efficient manner and handling structured programs with functors. Moreover, we propose a computationally-efficient structure learning algorithm to perform explanatory program induction on complex visual scenes. To evaluate, in addition to conventional visual reasoning tasks, we propose a new task, visual reasoning behind-the-scenes, where agents need to learn abstract programs and then answer queries by imagining scenes that are not observed. We empirically demonstrate that NEUMANN solves visual reasoning tasks efficiently, outperforming neural, symbolic, and neuro-symbolic baselines.
翻译:视觉推理对于构建能够理解世界并在感知之外进行问题解决的智能体至关重要。可微前向推理已被开发用于将推理与基于梯度的机器学习范式相结合。然而,由于内存密集性,现有大多数方法未能充分发挥一阶逻辑的表达能力,从而缺失了解决抽象视觉推理的关键能力——智能体需要通过在不同场景中对抽象概念进行类比推理。为解决这一问题,我们提出神经符号化消息传递推理器(NEUMANN),这是一种基于图的可微前向推理器,能以内存高效的方式传递消息,并处理包含函子的结构化程序。此外,我们提出一种计算高效的结构学习算法,用于在复杂视觉场景中进行解释性程序归纳。为进行评估,除传统视觉推理任务外,我们还提出一项新任务——幕后视觉推理,其中智能体需学习抽象程序,然后通过想象未观察到的场景来回答查询。实验表明,NEUMANN能高效解决视觉推理任务,其性能优于神经、符号及神经符号基线方法。