The challenge in learning abstract concepts from images in an unsupervised fashion lies in the required integration of visual perception and generalizable relational reasoning. Moreover, the unsupervised nature of this task makes it necessary for human users to be able to understand a model's learnt concepts and potentially revise false behaviours. To tackle both the generalizability and interpretability constraints of visual concept learning, we propose Pix2Code, a framework that extends program synthesis to visual relational reasoning by utilizing the abilities of both explicit, compositional symbolic and implicit neural representations. This is achieved by retrieving object representations from images and synthesizing relational concepts as lambda-calculus programs. We evaluate the diverse properties of Pix2Code on the challenging reasoning domains, Kandinsky Patterns and CURI, thereby testing its ability to identify compositional visual concepts that generalize to novel data and concept configurations. Particularly, in stark contrast to neural approaches, we show that Pix2Code's representations remain human interpretable and can be easily revised for improved performance.
翻译:从图像中以无监督方式学习抽象概念的挑战在于需要整合视觉感知与泛化关系推理。此外,该任务的无监督特性要求人类用户能够理解模型所学概念,并可能修正错误行为。为解决视觉概念学习中的泛化性与可解释性双重约束,我们提出Pix2Code框架——该框架通过融合显式组合符号表示与隐式神经表示的能力,将程序合成扩展至视觉关系推理领域。其实现方式为:从图像中提取对象表征,并将关系概念综合为lambda演算程序。我们在具有挑战性的推理领域(Kandinsky模式与CURI)中评估了Pix2Code的多元特性,检验其识别可泛化至新颖数据与概念配置的组合性视觉概念的能力。特别地,与神经方法形成鲜明对比的是,我们证明Pix2Code的表示始终具有人类可解释性,且可通过简单修正实现性能提升。