Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of B\'ezier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work. The code and demo of DiffSketcher can be found at https://ximinng.github.io/DiffSketcher-project/.
翻译:尽管主要在图像上训练,但我们发现预训练扩散模型在引导草图合成方面展现出显著能力。本文提出一种创新算法DiffSketcher,该算法利用自然语言输入生成矢量化的手绘草图。DiffSketcher基于预训练的文本到图像扩散模型开发,通过直接优化一组贝塞尔曲线并采用扩展版的分数蒸馏采样(SDS)损失函数实现任务目标,这使我们能够将光栅级扩散模型作为先验,用于优化参数化矢量草图生成器。此外,我们探索了扩散模型中嵌入的注意力图以实现有效的笔画初始化,从而加速生成过程。生成的草图在保持可辨识性、底层结构及所绘主题关键视觉细节的同时,呈现出多层次的抽象效果。实验表明,DiffSketcher的质量优于先前工作。其代码与演示可通过https://ximinng.github.io/DiffSketcher-project/获取。