Even though trained mainly on images, we discover that pretrained diffusion models show impressive power in guiding sketch synthesis. In this paper, we present DiffSketcher, an innovative algorithm that creates vectorized free-hand sketches using natural language input. DiffSketcher is developed based on a pre-trained text-to-image diffusion model. It performs the task by directly optimizing a set of Bezier curves with an extended version of the score distillation sampling (SDS) loss, which allows us to use a raster-level diffusion model as a prior for optimizing a parametric vectorized sketch generator. Furthermore, we explore attention maps embedded in the diffusion model for effective stroke initialization to speed up the generation process. The generated sketches demonstrate multiple levels of abstraction while maintaining recognizability, underlying structure, and essential visual details of the subject drawn. Our experiments show that DiffSketcher achieves greater quality than prior work.
翻译:尽管主要在图像上训练,但我们发现预训练扩散模型在引导草图合成方面展现出惊人的能力。本文提出DiffSketcher,一种利用自然语言输入生成矢量手绘草图的创新算法。该算法基于预训练的文本到图像扩散模型开发,通过直接优化一组贝塞尔曲线并采用扩展版的分数蒸馏采样(SDS)损失函数来实现任务,这使得我们能够将光栅级扩散模型作为先验,用于优化参数化矢量草图生成器。此外,我们探索了扩散模型中嵌入的注意力图以实现高效的笔画初始化,从而加速生成过程。生成的草图在保持主体可识别性、底层结构及关键视觉细节的同时,展现出多层次的抽象性。实验表明,DiffSketcher在生成质量上优于先前工作。