Transformer-based models achieve favorable performance in artistic style transfer recently thanks to its global receptive field and powerful multi-head/layer attention operations. Nevertheless, the over-paramerized multi-layer structure increases parameters significantly and thus presents a heavy burden for training. Moreover, for the task of style transfer, vanilla Transformer that fuses content and style features by residual connections is prone to content-wise distortion. In this paper, we devise a novel Transformer model termed as \emph{Master} specifically for style transfer. On the one hand, in the proposed model, different Transformer layers share a common group of parameters, which (1) reduces the total number of parameters, (2) leads to more robust training convergence, and (3) is readily to control the degree of stylization via tuning the number of stacked layers freely during inference. On the other hand, different from the vanilla version, we adopt a learnable scaling operation on content features before content-style feature interaction, which better preserves the original similarity between a pair of content features while ensuring the stylization quality. We also propose a novel meta learning scheme for the proposed model so that it can not only work in the typical setting of arbitrary style transfer, but also adaptable to the few-shot setting, by only fine-tuning the Transformer encoder layer in the few-shot stage for one specific style. Text-guided few-shot style transfer is firstly achieved with the proposed framework. Extensive experiments demonstrate the superiority of Master under both zero-shot and few-shot style transfer settings.
翻译:基于Transformer的模型凭借其全局感受野和强大的多头/多层注意力机制,近期在艺术风格迁移中取得了显著性能。然而,过度参数化的多层结构大幅增加了参数量,给训练带来沉重负担。此外,对于风格迁移任务,采用残差连接融合内容与风格特征的原始Transformer易造成内容失真。本文针对风格迁移设计了一种新型Transformer模型,称为Master。一方面,该模型中不同Transformer层共享同一组参数,此举能够(1)减少总参数量,(2)促进更稳健的训练收敛,(3)并在推理时通过自由调整堆叠层数易于控制风格化程度。另一方面,与原始版本不同,我们在内容与风格特征交互前对内容特征采用可学习缩放操作,该操作在保证风格化质量的同时,能更好地保留内容特征对之间的原始相似性。我们还为所提模型设计了一种新型元学习方案,使其不仅能在典型任意风格迁移场景下工作,还能适应少样本场景——仅需在少样本阶段微调Transformer编码器层即可适配特定风格。基于该框架首次实现了文本引导的少样本风格迁移。大量实验证明Master在零样本和少样本风格迁移设置下均具有优越性。