The process of generating data such as images is controlled by independent and unknown factors of variation. The retrieval of these variables has been studied extensively in the disentanglement, causal representation learning, and independent component analysis fields. Recently, approaches merging these domains together have shown great success. Instead of directly representing the factors of variation, the problem of disentanglement can be seen as finding the interventions on one image that yield a change to a single factor. Following this assumption, we introduce a new method for disentanglement inspired by causal dynamics that combines causality theory with vector-quantized variational autoencoders. Our model considers the quantized vectors as causal variables and links them in a causal graph. It performs causal interventions on the graph and generates atomic transitions affecting a unique factor of variation in the image. We also introduce a new task of action retrieval that consists of finding the action responsible for the transition between two images. We test our method on standard synthetic and real-world disentanglement datasets. We show that it can effectively disentangle the factors of variation and perform precise interventions on high-level semantic attributes of an image without affecting its quality, even with imbalanced data distributions.
翻译:图像等数据的生成过程由独立且未知的变异因子控制。解耦、因果表示学习和独立成分分析领域已对如何获取这些变量进行了广泛研究。近年来,融合这些领域的方法取得了显著成功。解耦问题并非直接表示变异因子,而是可以视为寻找能够使单一因子发生变化的图像干预操作。基于这一假设,我们提出了一种受因果动力学启发的新解耦方法,将因果理论与矢量量化变分自编码器相结合。我们的模型将量化向量视为因果变量,并将其通过因果图进行关联。通过对因果图执行因果干预,模型可生成影响图像中唯一变异因子的原子级转换。我们还提出了一个新颖的动作检索任务,旨在找出导致两幅图像之间转换的因果动作。我们在标准合成数据集和真实世界解耦数据集上测试了该方法。结果表明,即使面对不平衡的数据分布,该方法也能有效解耦变异因子,并在不影响图像质量的前提下对高级语义属性进行精确干预。