Pose and body shape editing in a human image has received increasing attention. However, current methods often struggle with dataset biases and deteriorate realism and the person's identity when users make large edits. We propose a one-shot approach that enables large edits with identity preservation. To enable large edits, we fit a 3D body model, project the input image onto the 3D model, and change the body's pose and shape. Because this initial textured body model has artifacts due to occlusion and the inaccurate body shape, the rendered image undergoes a diffusion-based refinement, in which strong noise destroys body structure and identity whereas insufficient noise does not help. We thus propose an iterative refinement with weak noise, applied first for the whole body and then for the face. We further enhance the realism by fine-tuning text embeddings via self-supervised learning. Our quantitative and qualitative evaluations demonstrate that our method outperforms other existing methods across various datasets.
翻译:摘要:人体图像中的姿态与身体形状编辑日益受到关注。然而,现有方法常受限于数据集偏差,当用户进行大幅编辑时,会损害真实感与人物身份特征。我们提出一种单次学习方法,能够在保留身份特征的同时实现大幅编辑。为支持大幅编辑,我们拟合三维人体模型,将输入图像映射至该三维模型,并调整其姿态与形状。由于初始纹理化模型存在遮挡与形状不准确导致的伪影,渲染图像需通过扩散模型进行细化,但强噪声会破坏人体结构与身份特征,而弱噪声则效果不足。因此,我们提出基于弱噪声的迭代细化策略,首先应用于全身区域,再针对面部进行优化。进一步地,我们通过自监督学习微调文本嵌入以增强真实感。定量与定性评估表明,我们的方法在多个数据集上均优于现有方法。