Diffusion-based image synthesis has attracted extensive attention recently. In particular, ControlNet that uses image-based prompts exhibits powerful capability in image tasks such as canny edge detection and generates images well aligned with these prompts. However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task. Recent context-learning approaches have improved its adaptability, but mainly for edge-based tasks, and rely on paired examples. Thus, two important open issues are yet to be addressed to reach the full potential of ControlNet: (i) zero-shot control for certain tasks and (ii) faster adaptation for non-edge-based tasks. In this paper, we introduce a novel Meta ControlNet method, which adopts the task-agnostic meta learning technique and features a new layer freezing design. Meta ControlNet significantly reduces learning steps to attain control ability from 5000 to 1000. Further, Meta ControlNet exhibits direct zero-shot adaptability in edge-based tasks without any finetuning, and achieves control within only 100 finetuning steps in more complex non-edge tasks such as Human Pose, outperforming all existing methods. The codes is available in https://github.com/JunjieYang97/Meta-ControlNet.
翻译:基于扩散的图像合成方法近期受到广泛关注。其中,采用图像提示的ControlNet在边缘检测等图像任务中展现出强大能力,并能生成与提示高度对齐的图像。然而,原始ControlNet通常需要对单个任务进行约5000步的密集训练才能实现理想控制。现有上下文学习方法虽提升了其适应性,但主要局限于边缘检测任务,且依赖配对样本。因此,要实现ControlNet的全部潜力仍需解决两个关键问题:(1)特定任务的零样本控制能力;(2)非边缘任务的快速适应能力。本文提出创新的Meta ControlNet方法,采用任务无关的元学习技术并设计新型层冻结机制。该方法将获得控制能力所需的学习步数从5000步显著缩减至1000步。进一步地,Meta ControlNet在边缘任务中无需微调即可实现零样本适应,在人体姿态等复杂非边缘任务中仅需100步微调就能达到控制效果,其性能超越所有现有方法。代码已开源:https://github.com/JunjieYang97/Meta-ControlNet。