We develop a neural network architecture which, trained in an unsupervised manner as a denoising diffusion model, simultaneously learns to both generate and segment images. Learning is driven entirely by the denoising diffusion objective, without any annotation or prior knowledge about regions during training. A computational bottleneck, built into the neural architecture, encourages the denoising network to partition an input into regions, denoise them in parallel, and combine the results. Our trained model generates both synthetic images and, by simple examination of its internal predicted partitions, a semantic segmentation of those images. Without any finetuning, we directly apply our unsupervised model to the downstream task of segmenting real images via noising and subsequently denoising them. Experiments demonstrate that our model achieves accurate unsupervised image segmentation and high-quality synthetic image generation across multiple datasets.
翻译:我们提出一种神经网络架构,该架构以去噪扩散模型的形式进行无监督训练,同时学会生成与分割图像。学习过程完全由去噪扩散目标驱动,无需任何训练过程中的标注或区域先验知识。嵌入神经网络架构中的计算瓶颈,促使去噪网络将输入划分为多个区域,并行处理这些区域的去噪任务,并整合结果。训练后的模型不仅能够生成合成图像,还能通过直接分析其内部预测的分区,实现对这些图像的语义分割。无需微调,我们直接将无监督模型应用于真实图像分割的下游任务——通过加噪后去噪的方式实现。实验表明,我们的模型在多个数据集上实现了精确的无监督图像分割与高质量的合成图像生成。