Diffusion models have recently demonstrated their effectiveness in generating extremely high-quality images and are now utilized in a wide range of applications, including automatic sketch colorization. Although many methods have been developed for guided sketch colorization, there has been limited exploration of the potential conflicts between image prompts and sketch inputs, which can lead to severe deterioration in the results. Therefore, this paper exhaustively investigates reference-based sketch colorization models that aim to colorize sketch images using reference color images. We specifically investigate two critical aspects of reference-based diffusion models: the "distribution problem", which is a major shortcoming compared to text-based counterparts, and the capability in zero-shot sequential text-based manipulation. We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder and propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs. We conduct comprehensive evaluations of our models through qualitative and quantitative experiments as well as a user study.
翻译:扩散模型近期在生成极高品质图像方面展现出卓越效能,现已被广泛应用于包括自动草图着色在内的众多领域。尽管已有多种针对引导式草图着色的方法被提出,但图像提示与草图输入之间潜在的冲突尚未得到充分探究,这种冲突可能导致生成结果严重劣化。为此,本文系统研究了基于参考图像的草图着色模型,其目标在于利用参考彩色图像对草图进行着色。我们重点探讨了基于参考的扩散模型的两个关键问题:相较于基于文本的模型存在显著不足的“分布问题”,以及其在零样本序列化文本操控方面的能力。我们引入了两种利用预训练CLIP图像编码器中不同图像特征的图像引导潜在扩散模型变体,并提出了相应的操控方法,以通过加权文本输入对其生成结果进行序列化调整。我们通过定性定量实验及用户研究对模型进行了全面评估。