We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g., ControlNet, LoRA) to work directly with the upgraded text-to-image diffusion model (e.g., SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model. After training, we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies, X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together, thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method, we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model.
翻译:我们提出了X-Adapter——一种通用升级器,使得预训练的即插即用模块(如ControlNet、LoRA)无需重新训练即可直接与升级后的文本到图像扩散模型(如SDXL)协同工作。通过训练额外网络,利用新文本-图像数据对控制冻结的升级模型,我们实现了这一目标。具体而言,X-Adapter保留旧模型的冻结副本以维持不同插件的连接器;同时添加可训练的映射层,桥接不同版本模型的解码器以实现特征重映射。重映射后的特征将作为升级模型的引导信号。为增强X-Adapter的引导能力,我们对升级模型采用空文本训练策略。训练完成后,我们进一步引入两阶段去噪策略,以对齐X-Adapter与升级模型的初始隐变量。得益于这些策略,X-Adapter展现出与各类插件的通用兼容性,并使不同版本的插件能够协同工作,从而拓展了扩散社区的功能边界。为验证所提方法的有效性,我们开展了大量实验,结果表明X-Adapter可促进升级版基础扩散模型更广泛的应用。