Generative models such as GANs and diffusion models have demonstrated impressive image generation capabilities. Despite these successes, these systems are surprisingly poor at creating images with hands. We propose a novel training framework for generative models that substantially improves the ability of such systems to create hand images. Our approach is to augment the training images with three additional channels that provide annotations to hands in the image. These annotations provide additional structure that coax the generative model to produce higher quality hand images. We demonstrate this approach on two different generative models: a generative adversarial network and a diffusion model. We demonstrate our method both on a new synthetic dataset of hand images and also on real photographs that contain hands. We measure the improved quality of the generated hands through higher confidence in finger joint identification using an off-the-shelf hand detector.
翻译:生成对抗网络和扩散模型等生成模型已展现出令人瞩目的图像生成能力。然而,这些系统在生成包含手部的图像时表现却出奇地差。我们提出了一种新颖的生成模型训练框架,能够显著提升此类系统生成手部图像的能力。我们的方法是在训练图像中增加三个额外通道,为图像中的手部提供注释信息。这些注释提供了额外的结构信息,引导生成模型生成更高质量的手部图像。我们在两种不同的生成模型(生成对抗网络和扩散模型)上验证了该方法,并同时在新型合成手部数据集和包含手部的真实照片上进行了实验。通过使用现成的手部检测器对手指关节识别置信度的提升,我们验证了生成手部图像质量的改善效果。