The binding problem in artificial neural networks is actively explored with the goal of achieving human-level recognition skills through the comprehension of the world in terms of symbol-like entities. Especially in the field of computer vision, object-centric learning (OCL) is extensively researched to better understand complex scenes by acquiring object representations or slots. While recent studies in OCL have made strides with complex images or videos, the interpretability and interactivity over object representation remain largely uncharted, still holding promise in the field of OCL. In this paper, we introduce a novel method, Slot Attention with Image Augmentation (SlotAug), to explore the possibility of learning interpretable controllability over slots in a self-supervised manner by utilizing an image augmentation strategy. We also devise the concept of sustainability in controllable slots by introducing iterative and reversible controls over slots with two proposed submethods: Auxiliary Identity Manipulation and Slot Consistency Loss. Extensive empirical studies and theoretical validation confirm the effectiveness of our approach, offering a novel capability for interpretable and sustainable control of object representations.
翻译:人工神经网络中的绑定问题正被积极探索,旨在通过理解符号化实体来达到人类级别的识别能力。特别是在计算机视觉领域,对象中心学习(OCL)被广泛研究,通过获取对象表征或槽(slots)来更好地理解复杂场景。尽管近年来OCL研究在复杂图像或视频处理方面取得了进展,但对象表征的可解释性和交互性仍属未充分探索的领域,在OCL中依然具有研究潜力。本文提出了一种新方法——基于图像增强的槽注意力机制(SlotAug),通过利用图像增强策略以自监督方式探索学习槽的可解释可控性。我们还通过引入两种子方法——辅助身份操控与槽一致性损失——实现了对槽的迭代式与可逆式控制,从而提出了可控槽的可持续性概念。广泛的实验研究与理论验证证实了该方法在实现对象表征的可解释性与可持续控制方面的新颖能力。