Text-to-image diffusion models have significantly advanced in conditional image generation. However, these models usually struggle with accurately rendering images featuring humans, resulting in distorted limbs and other anomalies. This issue primarily stems from the insufficient recognition and evaluation of limb qualities in diffusion models. To address this issue, we introduce AbHuman, the first large-scale synthesized human benchmark focusing on anatomical anomalies. This benchmark consists of 56K synthesized human images, each annotated with detailed, bounding-box level labels identifying 147K human anomalies in 18 different categories. Based on this, the recognition of human anomalies can be established, which in turn enhances image generation through traditional techniques such as negative prompting and guidance. To further boost the improvement, we propose HumanRefiner, a novel plug-and-play approach for the coarse-to-fine refinement of human anomalies in text-to-image generation. Specifically, HumanRefiner utilizes a self-diagnostic procedure to detect and correct issues related to both coarse-grained abnormal human poses and fine-grained anomaly levels, facilitating pose-reversible diffusion generation. Experimental results on the AbHuman benchmark demonstrate that HumanRefiner significantly reduces generative discrepancies, achieving a 2.9x improvement in limb quality compared to the state-of-the-art open-source generator SDXL and a 1.4x improvement over DALL-E 3 in human evaluations. Our data and code are available at https://github.com/Enderfga/HumanRefiner.
翻译:文本到图像扩散模型在条件图像生成方面取得了显著进展。然而,这些模型通常在准确渲染包含人体的图像时存在困难,导致肢体扭曲和其他异常。这一问题主要源于扩散模型对肢体质量识别与评估的不足。为解决此问题,我们提出了AbHuman,这是首个专注于解剖学异常的大规模合成人体基准。该基准包含56K张合成人体图像,每张图像均标注有详细的边界框级标签,识别出18个不同类别中的147K个人体异常。基于此,可以建立对人体异常的识别,进而通过负向提示和引导等传统技术增强图像生成。为进一步提升改进效果,我们提出了HumanRefiner,一种用于文本到图像生成中人体异常从粗到细精炼的新型即插即用方法。具体而言,HumanRefiner利用自诊断程序来检测和纠正与粗粒度异常人体姿态及细粒度异常级别相关的问题,促进姿态可逆的扩散生成。在AbHuman基准上的实验结果表明,HumanRefiner显著减少了生成差异,与最先进的开源生成器SDXL相比,在肢体质量上实现了2.9倍的提升,在人工评估中相比DALL-E 3也取得了1.4倍的改进。我们的数据和代码可在https://github.com/Enderfga/HumanRefiner获取。