Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates \textit{vectorized} free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of B\'ezier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work. The code and demo of DiffSketcher can be found at https://ximinng.github.io/DiffSketcher-project/.
翻译:尽管主要在图像上训练,但我们发现预训练扩散模型在指导草图合成方面展现出惊人的能力。本文提出DiffSketcher——一种利用自然语言输入创建\textit{矢量化}手绘草图的创新算法。该算法基于预训练的文本到图像扩散模型开发,通过直接优化一组贝塞尔曲线并采用扩展版分数蒸馏采样(SDS)损失来实现任务,这使得我们能够将光栅级扩散模型作为先验知识,用于优化参数化矢量草图生成器。此外,我们探索了扩散模型中嵌入的注意力图以进行有效的笔画初始化,从而加速生成过程。生成的草图在保持可辨识性、底层结构与被绘制对象关键视觉细节的同时,展现出多层次的抽象表达能力。实验表明,DiffSketcher的生成质量显著优于先前工作。DiffSketcher的代码与演示请访问https://ximinng.github.io/DiffSketcher-project/。