Learning modular object-centric representations is crucial for systematic generalization. Existing methods show promising object-binding capabilities empirically, but theoretical identifiability guarantees remain relatively underdeveloped. Understanding when object-centric representations can theoretically be identified is crucial for scaling slot-based methods to high-dimensional images with correctness guarantees. To that end, we propose a probabilistic slot-attention algorithm that imposes an aggregate mixture prior over object-centric slot representations, thereby providing slot identifiability guarantees without supervision, up to an equivalence relation. We provide empirical verification of our theoretical identifiability result using both simple 2-dimensional data and high-resolution imaging datasets.
翻译:学习模块化的面向对象表征对于系统性泛化至关重要。现有方法在经验层面展现出令人鼓舞的对象绑定能力,但理论上的可辨识性保证仍相对薄弱。理解面向对象表征在何种条件下具备理论可辨识性,对于将基于槽的方法扩展到具有正确性保证的高维图像具有关键意义。为此,我们提出一种概率槽注意力算法,该算法在面向对象的槽表征上施加了聚合混合先验,从而在无监督条件下提供了槽的可辨识性保证(等价关系下)。我们通过简单二维数据和高分辨率成像数据集,对理论可辨识性结果进行了实证验证。