We present a neural network structure, ControlNet, to control pretrained large diffusion models to support additional input conditions. The ControlNet learns task-specific conditions in an end-to-end way, and the learning is robust even when the training dataset is small (< 50k). Moreover, training a ControlNet is as fast as fine-tuning a diffusion model, and the model can be trained on a personal devices. Alternatively, if powerful computation clusters are available, the model can scale to large amounts (millions to billions) of data. We report that large diffusion models like Stable Diffusion can be augmented with ControlNets to enable conditional inputs like edge maps, segmentation maps, keypoints, etc. This may enrich the methods to control large diffusion models and further facilitate related applications.
翻译:我们提出了一种名为ControlNet的神经网络结构,用于控制预训练的大型扩散模型,以支持额外的输入条件。ControlNet以端到端的方式学习特定任务的条件,且即使训练数据集较小(< 50k),学习过程仍具有鲁棒性。此外,训练一个ControlNet的速度与微调扩散模型相当,并且该模型可以在个人设备上进行训练。若具备强大的计算集群,该模型还可以扩展到大量(百万至十亿级别)数据。我们报告指出,像Stable Diffusion这样的大型扩散模型可以通过添加ControlNet来支持边缘图、分割图、关键点等条件输入。这可能丰富控制大型扩散模型的方法,并进一步推动相关应用的发展。