We introduce the Nexus Adapters, novel text-guided efficient adapters to the diffusion-based framework for the Structure Preserving Conditional Generation (SPCG). Recently, structure-preserving methods have achieved promising results in conditional image generation by using a base model for prompt conditioning and an adapter for structure input, such as sketches or depth maps. These approaches are highly inefficient and sometimes require equal parameters in the adapter compared to the base architecture. It is not always possible to train the model since the diffusion model is itself costly, and doubling the parameter is highly inefficient. In these approaches, the adapter is not aware of the input prompt; therefore, it is optimal only for the structural input but not for the input prompt. To overcome the above challenges, we proposed two efficient adapters, Nexus Prime and Slim, which are guided by prompts and structural inputs. Each Nexus Block incorporates cross-attention mechanisms to enable rich multimodal conditioning. Therefore, the proposed adapter has a better understanding of the input prompt while preserving the structure. We conducted extensive experiments on the proposed models and demonstrated that the Nexus Prime adapter significantly enhances performance, requiring only 8M additional parameters compared to the baseline, T2I-Adapter. Furthermore, we also introduced a lightweight Nexus Slim adapter with 18M fewer parameters than the T2I-Adapter, which still achieved state-of-the-art results. Code: https://github.com/arya-domain/Nexus-Adapters
翻译:本文提出Nexus适配器,这是一种新颖的文本引导高效适配器,用于基于扩散的结构保持条件生成框架。近年来,结构保持方法通过使用基础模型进行提示条件化,并采用适配器处理结构输入(如草图或深度图),在条件图像生成领域取得了显著成果。然而,这些方法效率低下,有时适配器所需参数量甚至与基础架构相当。由于扩散模型本身训练成本高昂,参数量翻倍将导致极低效的训练过程。此外,现有适配器未考虑输入提示信息,导致其仅针对结构输入优化,而无法适配输入提示。为应对上述挑战,我们提出了两种由提示信息和结构输入共同引导的高效适配器:Nexus Prime与Nexus Slim。每个Nexus模块通过交叉注意力机制实现丰富的多模态条件控制。因此,所提出的适配器在保持结构信息的同时,能更好地理解输入提示。我们对所提模型进行了大量实验,结果表明Nexus Prime适配器仅需增加800万参数(相较于基线T2I-Adapter),即可显著提升生成性能。此外,我们还提出了轻量级Nexus Slim适配器,其参数量比T2I-Adapter减少1800万,仍能取得最先进的生成效果。代码地址:https://github.com/arya-domain/Nexus-Adapters