Spatial conditioning in pretrained text-to-image diffusion models has significantly improved fine-grained control over the structure of generated images. However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal Few-Shot Control (UFC), a versatile few-shot control adapter capable of generalizing to novel spatial conditions. Given a few image-condition pairs of an unseen task and a query condition, UFC leverages the analogy between query and support conditions to construct task-specific control features, instantiated by a matching mechanism and an update on a small set of task-specific parameters. Experiments on six novel spatial control tasks show that UFC, fine-tuned with only 30 annotated examples of novel tasks, achieves fine-grained control consistent with the spatial conditions. Notably, when fine-tuned with 0.1% of the full training data, UFC achieves competitive performance with the fully supervised baselines in various control tasks. We also show that UFC is applicable agnostically to various diffusion backbones and demonstrate its effectiveness on both UNet and DiT architectures. Code is available at https://github.com/kietngt00/UFC.
翻译:预训练文本到图像扩散模型中的空间条件控制显著提升了对生成图像结构的细粒度控制能力。然而,现有控制适配器在面对与训练任务差异较大的新型空间控制条件时,表现出有限的适应能力且训练成本高昂。为突破这一局限,我们提出通用少样本控制(UFC)——一种能够泛化至新型空间条件的通用少样本控制适配器。给定未见任务的少量图像-条件对及查询条件,UFC通过匹配机制与少量任务特定参数的更新,利用查询条件与支持条件之间的类比关系构建任务特定的控制特征。在六种新型空间控制任务上的实验表明,仅用30个新任务标注样本进行微调的UFC即可实现与空间条件一致的细粒度控制。值得注意的是,当使用全训练数据0.1%的样本进行微调时,UFC在各种控制任务中均能达到与全监督基线模型相竞争的性能。我们进一步证明UFC可无缝适配多种扩散模型主干网络,并在UNet和DiT架构上验证了其有效性。代码发布于https://github.com/kietngt00/UFC。