Denoising Diffusion Models (DDM) are emerging as the cutting-edge technology in the realm of deep generative modeling, challenging the dominance of Generative Adversarial Networks. However, effectively exploring the latent space's semantics and identifying compelling trajectories for manipulating and editing important attributes of the generated samples remains challenging, primarily due to the high-dimensional nature of the latent space. In this study, we specifically concentrate on face rotation, which is known to be one of the most intricate editing operations. By leveraging a recent embedding technique for Denoising Diffusion Implicit Models (DDIM), we achieve, in many cases, noteworthy manipulations encompassing a wide rotation angle of $\pm 30^o$, preserving the distinct characteristics of the individual. Our methodology exploits the computation of trajectories approximating clouds of latent representations of dataset samples with different yaw rotations through linear regression. Specific trajectories are obtained by restricting the analysis to subsets of data sharing significant attributes with the source image. One of these attributes is the light provenance: a byproduct of our research is a labeling of CelebA, categorizing images into three major groups based on the illumination direction: left, center, and right.
翻译:去噪扩散模型(DDM)正崛起为深度生成建模领域的前沿技术,挑战了生成对抗网络的主导地位。然而,由于潜在空间的高维特性,有效探索其语义并识别用于操控和编辑生成样本重要属性的显著轨迹仍具挑战性。本研究聚焦于面部旋转——这一公认最复杂的编辑操作之一。通过利用去噪扩散隐式模型(DDIM)的最新嵌入技术,我们在许多情况下实现了包含±30°大角度旋转的显著操控,同时保留了个体的独特特征。我们的方法通过线性回归计算近似不同偏航旋转下数据集样本潜在表示云的轨迹,并通过将分析限制在与源图像共享重要属性的数据子集来获得特定轨迹。其中一个重要属性是光源方向:本研究的副产品是CelebA数据集的分组标注,根据光照方向将图像划分为左、中、右三大类。