Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape. Because this initial textured body model has artifacts due to occlusion and the inaccurate body shape, the rendered image undergoes a diffusion-based refinement, in which strong noise destroys body structure and identity whereas insufficient noise does not help. We thus propose an iterative refinement with weak noise, applied first for the whole body and then for the face. We further enhance the realism by fine-tuning text embeddings via self-supervised learning. Our quantitative and qualitative evaluations demonstrate that our method outperforms other existing methods across various datasets.
翻译:摘要:人体图像中的姿态与体型编辑日益受到关注。然而,现有方法常受限于数据集偏差,当用户进行大幅度编辑时,会降低图像的真实性并破坏人物身份特征。我们提出了一种单次学习方法,能够在保持身份特征的同时实现大幅度编辑。为实现大幅度编辑,我们拟合三维人体模型,将输入图像映射至该三维模型,并调整其姿态与形状。由于初始纹理化人体模型因遮挡及不精确体型存在伪影,渲染图像需经扩散式优化处理——强噪声会破坏人体结构与身份特征,而弱噪声则难以产生效果。为此,我们提出基于弱噪声的迭代优化方法,首先应用于全身区域,随后聚焦面部。通过自监督学习微调文本嵌入,进一步提升了图像真实性。定量与定性评估表明,本方法在多个数据集上均优于现有方法。