Human brains respond to semantic features of presented stimuli with different neurons. It is then curious whether modern deep neural networks admit a similar behavior pattern. Specifically, this paper finds a small cluster of neurons in a diffusion model corresponding to a particular subject. We call those neurons the concept neurons. They can be identified by statistics of network gradients to a stimulation connected with the given subject. The concept neurons demonstrate magnetic properties in interpreting and manipulating generation results. Shutting them can directly yield the related subject contextualized in different scenes. Concatenating multiple clusters of concept neurons can vividly generate all related concepts in a single image. A few steps of further fine-tuning can enhance the multi-concept capability, which may be the first to manage to generate up to four different subjects in a single image. For large-scale applications, the concept neurons are environmentally friendly as we only need to store a sparse cluster of int index instead of dense float32 values of the parameters, which reduces storage consumption by 90\% compared with previous subject-driven generation methods. Extensive qualitative and quantitative studies on diverse scenarios show the superiority of our method in interpreting and manipulating diffusion models.
翻译:人脑对不同神经元对呈现刺激的语义特征作出响应。本文探究现代深度神经网络是否表现出类似的行为模式,具体发现扩散模型中存在与特定主体相对应的一小簇神经元,我们称之为概念神经元。这些神经元可通过网络对与给定主体相关的刺激的梯度统计特征进行识别。概念神经元在解释和操控生成结果时展现出磁效应:关闭它们可直接生成嵌入不同场景中的相关主体;串联多个概念神经元簇可在单张图像中生动生成所有相关概念。通过少量微调步骤可增强多概念生成能力——这是首次实现单张图像中最多生成四个不同主体的方法。在大规模应用中,概念神经元具有环境友好性,因其仅需存储稀疏整数索引簇而无需稠密浮点型参数值,与先前的主题驱动生成方法相比减少了90%的存储消耗。跨多场景的定性与定量研究表明,本方法在解释和操控扩散模型方面具有显著优越性。