Although diffusion models can generate high-quality human images, their applications are limited by the instability in generating hands with correct structures. Some previous works mitigate the problem by considering hand structure yet struggle to maintain style consistency between refined malformed hands and other image regions. In this paper, we aim to solve the problem of inconsistency regarding hand structure and style. We propose a conditional diffusion-based framework RHanDS to refine the hand region with the help of decoupled structure and style guidance. Specifically, the structure guidance is the hand mesh reconstructed from the malformed hand, serving to correct the hand structure. The style guidance is a hand image, e.g., the malformed hand itself, and is employed to furnish the style reference for hand refining. In order to suppress the structure leakage when referencing hand style and effectively utilize hand data to improve the capability of the model, we build a multi-style hand dataset and introduce a twostage training strategy. In the first stage, we use paired hand images for training to generate hands with the same style as the reference. In the second stage, various hand images generated based on the human mesh are used for training to enable the model to gain control over the hand structure. We evaluate our method and counterparts on the test dataset of the proposed multi-style hand dataset. The experimental results show that RHanDS can effectively refine hands structure- and style- correctly compared with previous methods. The codes and datasets will be available soon.
翻译:尽管扩散模型能够生成高质量的人体图像,但其应用受到生成手部结构不稳定的限制。先前的一些工作通过考虑手部结构来缓解该问题,但在优化后的畸形手部与图像其他区域之间难以保持风格一致性。本文旨在解决手部结构与风格不一致的问题。我们提出基于条件扩散的框架RHanDS,借助解耦的结构与风格引导来优化手部区域。具体而言,结构引导是从畸形手部重建的手部网格,用于修正手部结构;风格引导是一张手部图像(例如畸形手部本身),为手部优化提供风格参考。为抑制参考手部风格时的结构泄漏并有效利用手部数据提升模型能力,我们构建了多风格手部数据集,并引入两阶段训练策略。第一阶段使用成对手部图像进行训练,以生成与参考图像风格一致的手部;第二阶段基于人体网格生成多种手部图像用于训练,使模型获得对手部结构的控制能力。我们在所构建的多风格手部数据集测试集上评估了本方法与对比方法。实验结果表明,与先前方法相比,RHanDS能够从结构与风格两方面有效优化手部。相关代码和数据集将很快公开。