The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
翻译:数据生成(如图像)过程由独立且未知的变化因子控制。这些变量的提取已在解耦、因果表征学习和独立成分分析领域得到广泛研究。近年来,融合这些领域的方法取得了显著成功。解耦问题并非直接表征变化因子,而可视为寻找对单张图像实施能改变单一因子的干预操作。基于这一假设,我们提出一种受因果动力学启发的新解耦方法,将因果理论与向量量化变分自编码器相结合。本模型将量化向量视作因果变量,并通过因果图建立其关联。模型对因果图实施干预,生成仅改变图像中单一变化因子的原子级转换。我们还提出新的动作提取任务,旨在识别导致两张图像间转换的具体动作。我们在标准合成及真实世界解耦数据集上验证方法有效性,证明该方法不仅能有效解耦变化因子,还能在不影响图像质量的前提下,对图像高层语义属性实施精准干预,即使在数据分布不平衡的场景下也依然表现优异。