We tackle the problems of latent variables identification and ``out-of-support'' image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
翻译:我们解决了表示学习中的潜变量识别与“支持域外”图像生成问题。我们证明,对于一类称为加性解码器的模型,这两种问题都是可解的——这类解码器令人联想到用于面向对象的表示学习(OCRL)的解码器,且特别适用于可分解为对象特定图像之和的图像。我们给出了相应条件:当使用加性解码器精确求解重构问题时,能保证潜变量分块在置换和分块可逆变换意义下被识别。这一保证仅依赖于关于潜因子分布的极弱假设,这些潜因子可能具有统计依赖性,且其支撑集几乎可以是任意形状。该结果提供了非线性独立成分分析(ICA)可行的新场景,并增进了我们对OCRL方法的理论理解。我们还从理论上证明,加性解码器能够通过以新颖方式重新组合观测到的变化因子来生成新图像,我们将这种能力称为笛卡尔积外推。在模拟数据上的实验表明,加性对于可识别性与外推性均至关重要。