Visual reasoning is a long-term goal of vision research. In the last decade, several works have attempted to apply deep neural networks (DNNs) to the task of learning visual relations from images, with modest results in terms of the generalization of the relations learned. In recent years, several innovations in DNNs have been developed in order to enable learning abstract relation from images. In this work, we systematically evaluate a series of DNNs that integrate mechanism such as slot attention, recurrently guided attention, and external memory, in the simplest possible visual reasoning task: deciding whether two objects are the same or different. We found that, although some models performed better than others in generalizing the same-different relation to specific types of images, no model was able to generalize this relation across the board. We conclude that abstract visual reasoning remains largely an unresolved challenge for DNNs.
翻译:视觉推理是视觉研究的长期目标。过去十年间,多项研究尝试将深度神经网络应用于从图像中学习视觉关系的任务,但在所学习关系的泛化能力方面成果有限。近年来,深度神经网络领域涌现出多项创新技术,旨在从图像中学习抽象关系。本研究系统评估了一系列整合了槽注意力、递归引导注意力及外部记忆等机制的深度神经网络,并将其应用于最简单的视觉推理任务:判断两个物体是否相同。我们发现,尽管某些模型在将"相同-不同"关系泛化至特定类型图像时表现更优,但没有任何模型能够实现该关系的全面泛化。我们得出结论:抽象视觉推理对于深度神经网络而言仍是一项未解的重大挑战。