Generative text-to-image models, such as Stable Diffusion, have demonstrated a remarkable ability to generate diverse, high-quality images. However, they are surprisingly inept when it comes to rendering human hands, which are often anatomically incorrect or reside in the "uncanny valley". In this paper, we propose a method HandCraft for restoring such malformed hands. This is achieved by automatically constructing masks and depth images for hands as conditioning signals using a parametric model, allowing a diffusion-based image editor to fix the hand's anatomy and adjust its pose while seamlessly integrating the changes into the original image, preserving pose, color, and style. Our plug-and-play hand restoration solution is compatible with existing pretrained diffusion models, and the restoration process facilitates adoption by eschewing any fine-tuning or training requirements for the diffusion models. We also contribute MalHand datasets that contain generated images with a wide variety of malformed hands in several styles for hand detector training and hand restoration benchmarking, and demonstrate through qualitative and quantitative evaluation that HandCraft not only restores anatomical correctness but also maintains the integrity of the overall image.
翻译:生成式文本到图像模型(如Stable Diffusion)已展现出生成多样化高质量图像的卓越能力。然而,在渲染人类手部时这些模型却表现出令人惊讶的缺陷,所生成的手部常存在解剖学错误或陷入"恐怖谷"效应。本文提出一种名为HandCraft的畸形手部修复方法。该方法通过参数化模型自动构建手部掩码和深度图像作为条件信号,使基于扩散的图像编辑器能够修正手部解剖结构并调整其姿态,同时将修改内容无缝融合到原始图像中,保持姿态、色彩与风格的连贯性。我们的即插即用手部修复方案兼容现有预训练扩散模型,修复过程无需对扩散模型进行微调或训练,极大提升了方法的易用性。我们还贡献了MalHand数据集,其中包含多种风格下具有各类畸形手部的生成图像,可用于手部检测器训练和手部修复基准测试。定性与定量评估表明,HandCraft不仅能恢复手部的解剖学正确性,还能保持图像整体的完整性。