Image-to-image translation (i2i) networks suffer from entanglement effects in presence of physics-related phenomena in target domain (such as occlusions, fog, etc), lowering altogether the translation quality, controllability and variability. In this paper, we propose a general framework to disentangle visual traits in target images. Primarily, we build upon collection of simple physics models, guiding the disentanglement with a physical model that renders some of the target traits, and learning the remaining ones. Because physics allows explicit and interpretable outputs, our physical models (optimally regressed on target) allows generating unseen scenarios in a controllable manner. Secondarily, we show the versatility of our framework to neural-guided disentanglement where a generative network is used in place of a physical model in case the latter is not directly accessible. Altogether, we introduce three strategies of disentanglement being guided from either a fully differentiable physics model, a (partially) non-differentiable physics model, or a neural network. The results show our disentanglement strategies dramatically increase performances qualitatively and quantitatively in several challenging scenarios for image translation.
翻译:图像到图像的翻译网络在目标域中存在与物理相关现象(如遮挡、雾等)时,会出现纠缠效应,从而降低翻译质量、可控性和多样性。本文提出一个通用框架,用于解缠目标图像中的视觉特征。首先,我们基于一系列简单物理模型,通过物理模型引导解缠,渲染部分目标特征,并学习剩余特征。由于物理模型能够产生明确且可解释的输出,这些模型(在目标上优化回归后)允许以可控方式生成未见过的场景。其次,我们展示了框架在神经引导解缠中的通用性——当物理模型无法直接获取时,可用生成网络替代物理模型。总体而言,我们提出了三种解缠策略:分别基于完全可微分物理模型、(部分)非可微分物理模型或神经网络。结果表明,在多个具有挑战性的图像翻译场景中,我们的解缠策略在定性和定量上均显著提升了性能。