We tackle the problems of latent variables identification and "out-of-support" image generation in representation learning. We show that both are possible for a class of decoders that we call additive, which are reminiscent of decoders used for object-centric representation learning (OCRL) and well suited for images that can be decomposed as a sum of object-specific images. We provide conditions under which exactly solving the reconstruction problem using an additive decoder is guaranteed to identify the blocks of latent variables up to permutation and block-wise invertible transformations. This guarantee relies only on very weak assumptions about the distribution of the latent factors, which might present statistical dependencies and have an almost arbitrarily shaped support. Our result provides a new setting where nonlinear independent component analysis (ICA) is possible and adds to our theoretical understanding of OCRL methods. We also show theoretically that additive decoders can generate novel images by recombining observed factors of variations in novel ways, an ability we refer to as Cartesian-product extrapolation. We show empirically that additivity is crucial for both identifiability and extrapolation on simulated data.
翻译:我们解决了表示学习中的潜在变量识别和"支持域外"图像生成问题。我们证明,对于一类我们称为加性解码器的模型,这两个问题都是可解的——这类解码器让人联想到用于目标中心表示学习(OCRL)的解码器,并且特别适用于可分解为多个对象特定图像之和的图像。我们给出了相关条件,在此条件下,使用加性解码器精确求解重构问题能够保证识别出潜在变量块(仅需考虑排列和可逆块变换)。该保证仅依赖于关于潜在因子分布非常弱的假设,这些因子可能具有统计依赖性且其支撑集几乎可以是任意形状。我们的结果为非线性独立成分分析(ICA)的可行性提供了新场景,并增进了对OCRL方法的理论理解。我们还从理论上证明,加性解码器能够通过以新颖方式重新组合观测到的变化因子来生成新图像——我们将这种能力称为笛卡尔积外推。实验结果表明,加性性质对模拟数据上的可识别性和外推能力均至关重要。