Despite the significant recent progress in deep generative models, the underlying structure of their latent spaces is still poorly understood, thereby making the task of performing semantically meaningful latent traversals an open research challenge. Most prior work has aimed to solve this challenge by modeling latent structures linearly, and finding corresponding linear directions which result in `disentangled' generations. In this work, we instead propose to model latent structures with a learned dynamic potential landscape, thereby performing latent traversals as the flow of samples down the landscape's gradient. Inspired by physics, optimal transport, and neuroscience, these potential landscapes are learned as physically realistic partial differential equations, thereby allowing them to flexibly vary over both space and time. To achieve disentanglement, multiple potentials are learned simultaneously, and are constrained by a classifier to be distinct and semantically self-consistent. Experimentally, we demonstrate that our method achieves both more qualitatively and quantitatively disentangled trajectories than state-of-the-art baselines. Further, we demonstrate that our method can be integrated as a regularization term during training, thereby acting as an inductive bias towards the learning of structured representations, ultimately improving model likelihood on similarly structured data.
翻译:尽管深度生成模型近期取得了显著进展,但其潜空间的内在结构仍未被充分理解,这使得执行具有语义意义的潜空间遍历成为一项开放性的研究挑战。以往大多数研究试图通过线性建模潜结构来解决这一问题,并寻找能产生"解缠"生成的对应线性方向。在本工作中,我们提出通过学习动态势能景观来建模潜结构,从而将潜空间遍历实现为样本沿势能景观梯度流动的过程。受物理学、最优输运和神经科学的启发,这些势能景观被学习为物理上合理的偏微分方程,从而能够灵活地在空间和时间上变化。为实现解缠,多个势能同时被学习,并通过分类器约束其具有区分性和语义自洽性。实验表明,与最先进的基线方法相比,我们的方法在定性和定量上都实现了更优的解缠轨迹。此外,我们证明该方法可作为训练过程中的正则化项,从而作为结构表征学习的归纳偏置,最终在类似结构的数据上提升模型似然度。