Few-shot image generation, which aims to produce plausible and diverse images for one category given a few images from this category, has drawn extensive attention. Existing approaches either globally interpolate different images or fuse local representations with pre-defined coefficients. However, such an intuitive combination of images/features only exploits the most relevant information for generation, leading to poor diversity and coarse-grained semantic fusion. To remedy this, this paper proposes a novel textural modulation (TexMod) mechanism to inject external semantic signals into internal local representations. Parameterized by the feedback from the discriminator, our TexMod enables more fined-grained semantic injection while maintaining the synthesis fidelity. Moreover, a global structural discriminator (StructD) is developed to explicitly guide the model to generate images with reasonable layout and outline. Furthermore, the frequency awareness of the model is reinforced by encouraging the model to distinguish frequency signals. Together with these techniques, we build a novel and effective model for few-shot image generation. The effectiveness of our model is identified by extensive experiments on three popular datasets and various settings. Besides achieving state-of-the-art synthesis performance on these datasets, our proposed techniques could be seamlessly integrated into existing models for a further performance boost.
翻译:少样本图像生成旨在从某一类别中仅凭少量图像生成合理且多样化的新图像,已引起广泛关注。现有方法或对不同图像进行全局插值,或使用预定义系数融合局部表征。然而,这种直观的图像/特征组合仅利用了生成中最相关的信息,导致多样性不足和粗粒度的语义融合。为解决这一问题,本文提出一种新颖的纹理调制(TexMod)机制,将外部语义信号注入内部局部表征。通过判别器反馈的参数化调制,TexMod在保持合成保真度的同时实现了更细粒度的语义注入。此外,我们开发了全局结构判别器(StructD),明确引导模型生成具有合理布局和轮廓的图像。进一步通过鼓励模型区分频率信号,增强其频率感知能力。结合这些技术,我们构建了一种新颖有效的少样本图像生成模型。在三个主流数据集和多种设置下的广泛实验验证了模型的有效性。除实现最先进的合成性能外,所提技术还可无缝集成至现有模型中以进一步提升性能。