Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with correctness guarantees. To that end, we propose a probabilistic slot-attention algorithm that imposes an aggregate mixture prior over object-centric slot representations, thereby providing slot identifiability guarantees without supervision, up to an equivalence relation. We provide empirical verification of our theoretical identifiability result using both simple 2-dimensional data and high-resolution imaging datasets.
翻译:学习模块化的对象中心表示对于系统性泛化至关重要。现有方法在经验上展现出有前景的对象绑定能力,但其理论可辨识性保证仍相对不足。理解对象中心表示在理论上何时可被辨识,对于将基于槽的方法扩展至高维图像并保证正确性至关重要。为此,我们提出一种概率槽注意力算法,该算法在对象中心槽表示上施加聚合混合先验,从而在无监督条件下提供槽表示的可辨识性保证(在等价关系意义下)。我们使用简单的二维数据和高分辨率成像数据集,对我们的理论可辨识性结果进行了实证验证。