Null-text Guidance in Diffusion Models is Secretly a Cartoon-style Creator

Classifier-free guidance is an effective sampling technique in diffusion models that has been widely adopted. The main idea is to extrapolate the model in the direction of text guidance and away from null-text guidance. In this paper, we demonstrate that null-text guidance in diffusion models is secretly a cartoon-style creator, i.e., the generated images can be efficiently transformed into cartoons by simply perturbing the null-text guidance. Specifically, we proposed two disturbance methods, i.e., Rollback disturbance (Back-D) and Image disturbance (Image-D), to construct misalignment between the noisy images used for predicting null-text guidance and text guidance (subsequently referred to as \textbf{null-text noisy image} and \textbf{text noisy image} respectively) in the sampling process. Back-D achieves cartoonization by altering the noise level of null-text noisy image via replacing $x_t$ with $x_{t+\Delta t}$. Image-D, alternatively, produces high-fidelity, diverse cartoons by defining $x_t$ as a clean input image, which further improves the incorporation of finer image details. Through comprehensive experiments, we delved into the principle of noise disturbing for null-text and uncovered that the efficacy of disturbance depends on the correlation between the null-text noisy image and the source image. Moreover, our proposed techniques, which can generate cartoon images and cartoonize specific ones, are training-free and easily integrated as a plug-and-play component in any classifier-free guided diffusion model. Project page is available at \url{https://nulltextforcartoon.github.io/}.

翻译：无分类器引导是扩散模型中一种有效的采样技术，已被广泛采用。其主要思想是让模型沿文本引导方向外推，背离零文本引导。本文证明，扩散模型中的零文本引导秘密地是一种卡通风格创造者，即仅通过扰动零文本引导，即可高效地将生成图像转化为卡通风格。具体而言，我们提出了两种扰动方法，即回滚扰动（Back-D）和图像扰动（Image-D），以在采样过程中构建用于预测零文本引导与文本引导的噪声图像（下文分别称为**零文本噪声图像**和**文本噪声图像**）之间的失配。Back-D通过将$x_t$替换为$x_{t+\Delta t}$以改变零文本噪声图像的噪声水平，从而实现卡通化。Image-D则将$x_t$定义为清晰的输入图像，生成高保真、多样化的卡通图像，进一步提高了精细图像细节的融入程度。通过全面实验，我们深入探究了零文本噪声扰动的原理，并发现扰动的有效性取决于零文本噪声图像与源图像之间的相关性。此外，我们提出的技术能够生成卡通图像并对特定图像进行卡通化，且无需训练，可轻松作为即插即用组件集成到任何无分类器引导的扩散模型中。项目页面详见\url{https://nulltextforcartoon.github.io/}。