Learning object-centric representations from complex natural environments enables both humans and machines with reasoning abilities from low-level perceptual features. To capture compositional entities of the scene, we proposed cyclic walks between perceptual features extracted from vision transformers and object entities. First, a slot-attention module interfaces with these perceptual features and produces a finite set of slot representations. These slots can bind to any object entities in the scene via inter-slot competitions for attention. Next, we establish entity-feature correspondence with cyclic walks along high transition probability based on the pairwise similarity between perceptual features (aka "parts") and slot-binded object representations (aka "whole"). The whole is greater than its parts and the parts constitute the whole. The part-whole interactions form cycle consistencies, as supervisory signals, to train the slot-attention module. Our rigorous experiments on \textit{seven} image datasets in \textit{three} \textit{unsupervised} tasks demonstrate that the networks trained with our cyclic walks can disentangle foregrounds and backgrounds, discover objects, and segment semantic objects in complex scenes. In contrast to object-centric models attached with a decoder for the pixel-level or feature-level reconstructions, our cyclic walks provide strong learning signals, avoiding computation overheads and enhancing memory efficiency. Our source code and data are available at: \href{https://github.com/ZhangLab-DeepNeuroCogLab/Parts-Whole-Object-Centric-Learning/}{link}.
翻译:从复杂自然环境中学习目标中心表征,使人类和机器都能基于低层感知特征具备推理能力。为捕捉场景中的组合实体,我们提出了在视觉Transformer提取的感知特征与目标实体之间进行循环游走。首先,插槽注意力模块与这些感知特征交互,生成有限数量的插槽表征。这些插槽通过插槽间的注意力竞争,能够绑定场景中的任意目标实体。接着,我们根据感知特征(即“部分”)与插槽绑定目标表征(即“整体”)之间的成对相似性,沿高转移概率建立实体-特征对应关系。整体大于部分之和,而部分构成整体。部分与整体间的相互作用形成循环一致性,作为监督信号训练插槽注意力模块。我们在三个无监督任务中的七个图像数据集上进行了严格实验,结果表明,采用循环游走训练的网络能够解耦前景与背景、发现目标,并在复杂场景中分割语义目标。与附加解码器进行像素级或特征级重构的目标中心模型不同,我们的循环游走提供了强学习信号,避免了计算开销并提升了内存效率。源代码与数据见:\href{https://github.com/ZhangLab-DeepNeuroCogLab/Parts-Whole-Object-Centric-Learning/}{链接}。