In this paper, we introduce a Key-point-guided Diffusion probabilistic Model (KDM) that gains precise control over images by manipulating the object's key-point. We propose a two-stage generative model incorporating an optical flow map as an intermediate output. By doing so, a dense pixel-wise understanding of the semantic relation between the image and sparse key point is configured, leading to more realistic image generation. Additionally, the integration of optical flow helps regulate the inter-frame variance of sequential images, demonstrating an authentic sequential image generation. The KDM is evaluated with diverse key-point conditioned image synthesis tasks, including facial image generation, human pose synthesis, and echocardiography video prediction, demonstrating the KDM is proving consistency enhanced and photo-realistic images compared with state-of-the-art models.
翻译:本文提出了一种关键点引导的扩散概率模型(Key-point-guided Diffusion probabilistic Model, KDM),通过操控目标关键点实现对图像的精确控制。我们构建了一个两阶段生成模型,将光流图作为中间输出。通过这种方式,模型能够建立图像与稀疏关键点之间密集的像素级语义关系理解,从而生成更逼真的图像。此外,光流的引入有助于调节序列图像的帧间方差,实现真实的序列图像生成。我们在多种关键点条件图像合成任务上评估了KDM,包括人脸图像生成、人体姿态合成以及超声心动图视频预测。实验结果表明,与现有最优模型相比,KDM在保持一致性和生成逼真图像方面均展现出显著优势。