Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features

Diffusion models are initially designed for image generation. Recent research shows that the internal signals within their backbones, named activations, can also serve as dense features for various discriminative tasks such as semantic segmentation. Given numerous activations, selecting a small yet effective subset poses a fundamental problem. To this end, the early study of this field performs a large-scale quantitative comparison of the discriminative ability of the activations. However, we find that many potential activations have not been evaluated, such as the queries and keys used to compute attention scores. Moreover, recent advancements in diffusion architectures bring many new activations, such as those within embedded ViT modules. Both combined, activation selection remains unresolved but overlooked. To tackle this issue, this paper takes a further step with a much broader range of activations evaluated. Considering the significant increase in activations, a full-scale quantitative comparison is no longer operational. Instead, we seek to understand the properties of these activations, such that the activations that are clearly inferior can be filtered out in advance via simple qualitative evaluation. After careful analysis, we discover three properties universal among diffusion models, enabling this study to go beyond specific models. On top of this, we present effective feature selection solutions for several popular diffusion models. Finally, the experiments across multiple discriminative tasks validate the superiority of our method over the SOTA competitors. Our code is available at https://github.com/Darkbblue/generic-diffusion-feature.

翻译：扩散模型最初是为图像生成而设计的。近期研究表明，其主干网络内部的信号（称为激活值）同样可作为密集特征用于多种判别性任务，如语义分割。面对海量激活值，如何选取一个精简而有效的子集构成了一个基础性问题。为此，该领域的早期研究对激活值的判别能力进行了大规模量化比较。然而，我们发现许多潜在激活值尚未得到评估，例如用于计算注意力分数的查询向量与键向量。此外，扩散架构的最新进展引入了诸多新型激活值，例如嵌入ViT模块内部的激活值。二者叠加使得激活值选择问题依然悬而未决且被忽视。为解决此问题，本文通过评估更广泛的激活值范围推进研究。考虑到激活值数量的显著增长，全量量化比较已不可行。我们转而尝试理解这些激活值的特性，从而通过简单的定性评估预先筛除明显劣质的激活值。经过细致分析，我们发现了扩散模型中普遍存在的三种特性，使得本研究能够超越特定模型范畴。在此基础上，我们针对多个主流扩散模型提出了有效的特征选择方案。最终，跨多个判别性任务的实验验证了本方法相较于现有最优竞争方法的优越性。代码已开源：https://github.com/Darkbblue/generic-diffusion-feature。