We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner. The appearance generator network models the information related to appearance, including color, illumination, identity or category, while the geometric generator performs geometric warping, such as rotation and stretching, through generating deformation field which is used to warp the generated appearance to obtain the final image or video sequences. Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences. For video data, a nonlinear transition model is introduced to both the appearance and geometric generators to capture the dynamics over time. The proposed scheme is general and can be easily integrated into different generative models. An extensive set of qualitative and quantitative experiments shows that the appearance and geometric information can be well disentangled, and the learned geometric generator can be conveniently transferred to other image datasets to facilitate knowledge transfer tasks.
翻译:我们提出了一种可变形生成器模型,能够以纯无监督的方式对图像和视频数据中的外观与几何信息进行解耦。其中,外观生成器网络建模与外观相关的信息(包括颜色、光照、身份或类别),而几何生成器通过生成形变场来执行几何变换(如旋转和拉伸),利用该形变场对生成的外观进行扭曲,从而得到最终的图像或视频序列。两个生成器以独立的潜变量作为输入,实现从图像或视频序列中解耦外观与几何信息。针对视频数据,我们在外观和几何生成器中分别引入非线性转移模型以捕捉其随时间变化的动态特性。该方案具有通用性,可便捷地集成至不同生成模型中。大量定性与定量实验表明,外观与几何信息能够得到有效解耦,且习得的几何生成器可方便地迁移至其他图像数据集,促进知识迁移任务。