Learning representations that generalize to novel compositions of known concepts is crucial for bridging the gap between human and machine perception. One prominent effort is learning object-centric representations, which are widely conjectured to enable compositional generalization. Yet, it remains unclear when this conjecture will be true, as a principled theoretical or empirical understanding of compositional generalization is lacking. In this work, we investigate when compositional generalization is guaranteed for object-centric representations through the lens of identifiability theory. We show that autoencoders that satisfy structural assumptions on the decoder and enforce encoder-decoder consistency will learn object-centric representations that provably generalize compositionally. We validate our theoretical result and highlight the practical relevance of our assumptions through experiments on synthetic image data.
翻译:学习能够泛化到已知概念新颖组合的表征,对于弥合人类与机器感知之间的差距至关重要。一项重要的尝试是学习物体中心表征,这类表征被广泛推测能够实现组合泛化。然而,由于缺乏对组合泛化的系统性理论或实证理解,这一推测何时成立尚不明确。在本工作中,我们通过可辨识性理论的视角,探究物体中心表征何时能够保证组合泛化。我们证明,满足解码器结构假设并强制编码器-解码器一致性的自编码器,将学习到可证明具备组合泛化能力的物体中心表征。我们通过在合成图像数据上的实验验证了理论结果,并强调了所提假设的实际相关性。