The binding problem in artificial neural networks is actively explored with the goal of achieving human-level recognition skills through the comprehension of the world in terms of symbol-like entities. Especially in the field of computer vision, object-centric learning (OCL) is extensively researched to better understand complex scenes by acquiring object representations or slots. While recent studies in OCL have made strides with complex images or videos, the interpretability and interactivity over object representation remain largely uncharted, still holding promise in the field of OCL. In this paper, we introduce a novel method, Slot Attention with Image Augmentation (SlotAug), to explore the possibility of learning interpretable controllability over slots in a self-supervised manner by utilizing an image augmentation strategy. We also devise the concept of sustainability in controllable slots by introducing iterative and reversible controls over slots with two proposed submethods: Auxiliary Identity Manipulation and Slot Consistency Loss. Extensive empirical studies and theoretical validation confirm the effectiveness of our approach, offering a novel capability for interpretable and sustainable control of object representations.
翻译:人工神经网络中的捆绑问题正被积极探索,其目标是通过以符号化实体理解世界来实现人类级别的识别能力。尤其在计算机视觉领域,面向对象学习(OCL)被广泛研究,旨在通过获取对象表征(或称为槽位)来更好地理解复杂场景。尽管近期OCL研究在处理复杂图像或视频方面取得了进展,但对象表征的可解释性和交互性仍鲜有探索,这仍是OCL领域具有前景的方向。本文提出了一种新方法——基于图像增强的槽位注意力(SlotAug),通过利用图像增强策略探索以自监督方式学习槽位可解释可控性的可能性。我们还通过引入两种子方法——辅助身份操控与槽位一致性损失——实现了槽位的迭代与可逆控制,从而提出了可控槽位的可持续性概念。大量实证研究与理论验证证实了本方法的有效性,为对象表征的可解释与可持续控制提供了全新能力。