Diffusion models have demonstrated remarkable success in high-fidelity image synthesis and prompt-guided generative modeling. However, ensuring adequate diversity in generated samples of prompt-guided diffusion models remains a challenge, particularly when the prompts span a broad semantic spectrum and the diversity of generated data needs to be evaluated in a prompt-aware fashion across semantically similar prompts. Recent methods have introduced guidance via diversity measures to encourage more varied generations. In this work, we extend the diversity measure-based approaches by proposing the Scalable Prompt-Aware R\'eny Kernel Entropy Diversity Guidance (SPARKE) method for prompt-aware diversity guidance. SPARKE utilizes conditional entropy for diversity guidance, which dynamically conditions diversity measurement on similar prompts and enables prompt-aware diversity control. While the entropy-based guidance approach enhances prompt-aware diversity, its reliance on the matrix-based entropy scores poses computational challenges in large-scale generation settings. To address this, we focus on the special case of Conditional latent RKE Score Guidance, reducing entropy computation and gradient-based optimization complexity from the $O(n^3)$ of general entropy measures to $O(n)$. The reduced computational complexity allows for diversity-guided sampling over potentially thousands of generation rounds on different prompts. We numerically test the SPARKE method on several text-to-image diffusion models, demonstrating that the proposed method improves the prompt-aware diversity of the generated data without incurring significant computational costs. We release our code on the project page: https://mjalali.github.io/SPARKE
翻译:扩散模型在高保真图像合成与提示引导生成建模方面已展现出卓越性能。然而,确保提示引导扩散模型生成样本的充分多样性仍具挑战性,尤其当提示涵盖广泛语义范围且生成数据的多样性需以提示感知方式在语义相似提示间进行评估时。近期研究通过引入多样性度量引导机制以促进生成结果的多样化。本文通过提出可扩展提示感知Rényi核熵多样性引导方法(SPARKE),对基于多样性度量的方法进行了扩展。SPARKE利用条件熵进行多样性引导,动态地将多样性度量条件化于相似提示,实现提示感知的多样性控制。尽管基于熵的引导方法增强了提示感知多样性,其对矩阵熵分数的依赖在大规模生成场景中带来计算挑战。为此,我们聚焦于条件潜在RKE分数引导的特殊情形,将熵计算与基于梯度的优化复杂度从一般熵度量的$O(n^3)$降低至$O(n)$。降低的计算复杂度使得在不同提示下进行数千轮生成过程的多样性引导采样成为可能。我们在多个文生图扩散模型上对SPARKE方法进行了数值测试,结果表明所提方法在未显著增加计算成本的前提下,有效提升了生成数据的提示感知多样性。代码已发布于项目页面:https://mjalali.github.io/SPARKE