Ensembling Diffusion Models via Adaptive Feature Aggregation

The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. Existing methods primarily adopt parameter merging strategies to produce a new static model. However, they overlook the fact that the divergent denoising capabilities of the models may dynamically change across different states, such as when experiencing different prompts, initial noises, denoising steps, and spatial locations. In this paper, we propose a novel ensembling method, Adaptive Feature Aggregation (AFA), which dynamically adjusts the contributions of multiple models at the feature level according to various states (i.e., prompts, initial noises, denoising steps, and spatial locations), thereby keeping the advantages of multiple diffusion models, while suppressing their disadvantages. Specifically, we design a lightweight Spatial-Aware Block-Wise (SABW) feature aggregator that adaptive aggregates the block-wise intermediate features from multiple U-Net denoisers into a unified one. The core idea lies in dynamically producing an individual attention map for each model's features by comprehensively considering various states. It is worth noting that only SABW is trainable with about 50 million parameters, while other models are frozen. Both the quantitative and qualitative experiments demonstrate the effectiveness of our proposed Adaptive Feature Aggregation method. The code is available at https://github.com/tenvence/afa/.

翻译：文本引导扩散模型的成功激发了开源社区中众多强大扩散模型的开发与发布。这些模型通常在各类专家数据集上进行微调，展现出多样化的去噪能力。利用多个高质量模型以产生更强的生成能力具有重要价值，但尚未得到充分研究。现有方法主要采用参数合并策略来生成新的静态模型。然而，这些方法忽略了不同模型在去噪能力上的差异可能随状态动态变化的特性，例如在面对不同提示词、初始噪声、去噪步骤及空间位置时。本文提出一种新颖的集成方法——自适应特征聚合（AFA），该方法能够在特征层面根据多种状态（即提示词、初始噪声、去噪步骤和空间位置）动态调整多个模型的贡献度，从而保留多个扩散模型的优势并抑制其劣势。具体而言，我们设计了一个轻量级的空间感知分块特征聚合器，能够自适应地将多个U-Net去噪器的分块中间特征聚合为统一特征。其核心思想在于通过综合考虑多种状态，为每个模型的特征动态生成独立的注意力图。值得注意的是，仅SABW模块具有约5000万个可训练参数，其余模型均保持冻结状态。定量与定性实验均证明了我们提出的自适应特征聚合方法的有效性。代码发布于https://github.com/tenvence/afa/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日