Given a large dataset for training, generative adversarial networks (GANs) can achieve remarkable performance for the image synthesis task. However, training GANs in extremely low data regimes remains a challenge, as overfitting often occurs, leading to memorization or training divergence. In this work, we introduce SIV-GAN, an unconditional generative model that can generate new scene compositions from a single training image or a single video clip. We propose a two-branch discriminator architecture, with content and layout branches designed to judge internal content and scene layout realism separately from each other. This discriminator design enables synthesis of visually plausible, novel compositions of a scene, with varying content and layout, while preserving the context of the original sample. Compared to previous single image GANs, our model generates more diverse, higher quality images, while not being restricted to a single image setting. We further introduce a new challenging task of learning from a few frames of a single video. In this training setup the training images are highly similar to each other, which makes it difficult for prior GAN models to achieve a synthesis of both high quality and diversity.
翻译:尽管生成对抗网络(GANs)在图像合成任务中通过大规模训练数据集能取得显著性能,但在极低数据场景下训练GANs仍具挑战性——过拟合现象频繁导致记忆化或训练发散。本文提出SIV-GAN,一种能从单张训练图像或单个视频片段生成全新场景布局的无条件生成模型。我们设计了双分支判别器架构,其内容分支与布局分支分别独立判断内部内容与场景布局的真实性。这种判别器设计使得模型能够在保留原始样本语境的前提下,生成视觉逼真、兼具内容与布局变化的新颖场景构图。与以往单图像GANs相比,我们的模型可生成更多样化、更高质量的图像,且不受限于单图像设定。我们进一步提出从单视频少量帧进行学习的全新挑战性任务。该训练场景中,训练图像间高度相似,这使得现有GAN模型难以同时实现高质量与高多样性的合成。