Unsupervised object-centric learning aims to decompose scenes into interpretable object entities, termed slots. Slot-based auto-encoders stand out as a prominent method for this task. Within them, crucial aspects include guiding the encoder to generate object-specific slots and ensuring the decoder utilizes them during reconstruction. This work introduces two novel techniques, (i) an attention-based self-training approach, which distills superior slot-based attention masks from the decoder to the encoder, enhancing object segmentation, and (ii) an innovative patch-order permutation strategy for autoregressive transformers that strengthens the role of slot vectors in reconstruction. The effectiveness of these strategies is showcased experimentally. The combined approach significantly surpasses prior slot-based autoencoder methods in unsupervised object segmentation, especially with complex real-world images. We provide the implementation code at https://github.com/gkakogeorgiou/spot .
翻译:无监督以对象为中心学习旨在将场景分解为可解释的对象实体,即“槽(slot)”。基于槽的自编码器是完成此任务的一类重要方法。在这些方法中,关键方面包括引导编码器生成对象特定的槽,以及确保解码器在重建过程中有效利用这些槽。本文提出两种新技术:(i)基于注意力的自训练方法,该方法将解码器产生的更优槽注意力掩码蒸馏至编码器,从而增强对象分割;(ii)针对自回归Transformer的创新型补丁顺序置换策略,该策略强化了槽向量在重建中的作用。实验验证了这些策略的有效性。结合后的方法在无监督对象分割任务中显著优于先前的基于槽的自编码器方法,尤其适用于复杂真实图像。我们提供的实现代码见 https://github.com/gkakogeorgiou/spot 。