We introduce X-Adapter, a universal upgrader to enable the pretrained plug-and-play modules (e.g., ControlNet, LoRA) to work directly with the upgraded text-to-image diffusion model (e.g., SDXL) without further retraining. We achieve this goal by training an additional network to control the frozen upgraded model with the new text-image data pairs. In detail, X-Adapter keeps a frozen copy of the old model to preserve the connectors of different plugins. Additionally, X-Adapter adds trainable mapping layers that bridge the decoders from models of different versions for feature remapping. The remapped features will be used as guidance for the upgraded model. To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model. After training, we also introduce a two-stage denoising strategy to align the initial latents of X-Adapter and the upgraded model. Thanks to our strategies, X-Adapter demonstrates universal compatibility with various plugins and also enables plugins of different versions to work together, thereby expanding the functionalities of diffusion community. To verify the effectiveness of the proposed method, we conduct extensive experiments and the results show that X-Adapter may facilitate wider application in the upgraded foundational diffusion model.
翻译:我们提出X-Adapter,一种通用升级器,使得预训练的即插即用模块(如ControlNet、LoRA)能够直接与升级后的文本到图像扩散模型(如SDXL)协同工作,而无需额外重新训练。我们通过训练一个额外的网络,利用新的文本-图像数据对来控制冻结的升级模型,从而实现这一目标。具体而言,X-Adapter保留旧模型的冻结副本以保持不同插件的连接器。此外,X-Adapter添加可训练的映射层,桥接不同版本模型的解码器以进行特征重映射。重映射后的特征将作为升级模型的引导。为了增强X-Adapter的引导能力,我们为升级模型采用空文本训练策略。训练后,我们还引入两阶段去噪策略,以对齐X-Adapter与升级模型的初始潜在表示。得益于我们的策略,X-Adapter展现出与各种插件的通用兼容性,并能使不同版本的插件协同工作,从而扩展扩散社区的功能。为验证所提方法的有效性,我们进行了大量实验,结果表明X-Adapter可能促进升级基础扩散模型中的更广泛应用。