Generative Adversarial Networks (GANs) have emerged as a prominent research focus for image editing tasks, leveraging the powerful image generation capabilities of the GAN framework to produce remarkable results.However, prevailing approaches are contingent upon extensive training datasets and explicit supervision, presenting a significant challenge in manipulating the diverse attributes of new image classes with limited sample availability. To surmount this hurdle, we introduce TAGE, an innovative image generation network comprising three integral modules: the Codebook Learning Module (CLM), the Code Prediction Module (CPM) and the Prompt-driven Semantic Module (PSM). The CPM module delves into the semantic dimensions of category-agnostic attributes, encapsulating them within a discrete codebook. This module is predicated on the concept that images are assemblages of attributes, and thus, by editing these category-independent attributes, it is theoretically possible to generate images from unseen categories. Subsequently, the CPM module facilitates naturalistic image editing by predicting indices of category-independent attribute vectors within the codebook. Additionally, the PSM module generates semantic cues that are seamlessly integrated into the Transformer architecture of the CPM, enhancing the model's comprehension of the targeted attributes for editing. With these semantic cues, the model can generate images that accentuate desired attributes more prominently while maintaining the integrity of the original category, even with a limited number of samples. We have conducted extensive experiments utilizing the Animal Faces, Flowers, and VGGFaces datasets. The results of these experiments demonstrate that our proposed method not only achieves superior performance but also exhibits a high degree of stability when compared to other few-shot image generation techniques.
翻译:生成对抗网络(GAN)已成为图像编辑任务的重要研究方向,其利用GAN框架强大的图像生成能力取得了显著成果。然而,现有方法通常依赖于大规模训练数据集和显式监督,在样本有限的情况下难以有效操纵新图像类别的多样化属性。为克服这一挑战,我们提出TAGE——一种创新的图像生成网络,包含三个核心模块:码本学习模块(CLM)、码本预测模块(CPM)和提示驱动语义模块(PSM)。CLM模块深入探究类别无关属性的语义维度,将其编码至离散码本中。该模块基于"图像是属性集合体"的核心思想,理论上通过编辑这些与类别无关的属性,能够生成未见类别的图像。CPM模块通过预测码本中类别无关属性向量的索引,实现自然流畅的图像编辑。此外,PSM模块生成的语义线索被无缝集成到CPM的Transformer架构中,增强了模型对目标编辑属性的理解能力。借助这些语义线索,即使在有限样本条件下,模型也能在保持原始类别完整性的同时,生成更突出目标属性的图像。我们在Animal Faces、Flowers和VGGFaces数据集上进行了大量实验。结果表明,相较于其他少样本图像生成技术,我们提出的方法不仅实现了更优的性能,还展现出更高的稳定性。