Diffusion models have recently demonstrated their effectiveness in generating extremely high-quality images and are now utilized in a wide range of applications, including automatic sketch colorization. Although many methods have been developed for guided sketch colorization, there has been limited exploration of the potential conflicts between image prompts and sketch inputs, which can lead to severe deterioration in the results. Therefore, this paper exhaustively investigates reference-based sketch colorization models that aim to colorize sketch images using reference color images. We specifically investigate two critical aspects of reference-based diffusion models: the "distribution problem", which is a major shortcoming compared to text-based counterparts, and the capability in zero-shot sequential text-based manipulation. We introduce two variations of an image-guided latent diffusion model utilizing different image tokens from the pre-trained CLIP image encoder and propose corresponding manipulation methods to adjust their results sequentially using weighted text inputs. We conduct comprehensive evaluations of our models through qualitative and quantitative experiments as well as a user study.
翻译:扩散模型近期在生成极高品质图像方面展现出卓越效果,现已广泛应用于包括自动草图着色在内的众多领域。尽管目前已发展出多种引导式草图着色方法,但针对图像提示与草图输入间潜在冲突的探索仍显不足,此类冲突可能导致生成结果严重劣化。为此,本文系统研究了基于参考图像的草图着色模型,其目标在于利用参考彩色图像为草图图像着色。我们重点探究基于参考的扩散模型的两个关键维度:相较于文本引导模型存在显著缺陷的“分布问题”,以及基于文本的零样本序列化操控能力。我们提出了两种采用预训练CLIP图像编码器中不同图像标记的图像引导潜在扩散模型变体,并设计了相应的操控方法,通过加权文本输入对生成结果进行序列化调整。我们通过定性定量实验及用户研究对模型进行了全面评估。