FISTNet: FusIon of STyle-path generative Networks for Facial Style Transfer

With the surge in emerging technologies such as Metaverse, spatial computing, and generative AI, the application of facial style transfer has gained a lot of interest from researchers as well as startups enthusiasts alike. StyleGAN methods have paved the way for transfer-learning strategies that could reduce the dependency on the huge volume of data that is available for the training process. However, StyleGAN methods have the tendency of overfitting that results in the introduction of artifacts in the facial images. Studies, such as DualStyleGAN, proposed the use of multipath networks but they require the networks to be trained for a specific style rather than generating a fusion of facial styles at once. In this paper, we propose a FusIon of STyles (FIST) network for facial images that leverages pre-trained multipath style transfer networks to eliminate the problem associated with lack of huge data volume in the training phase along with the fusion of multiple styles at the output. We leverage pre-trained styleGAN networks with an external style pass that use residual modulation block instead of a transform coding block. The method also preserves facial structure, identity, and details via the gated mapping unit introduced in this study. The aforementioned components enable us to train the network with very limited amount of data while generating high-quality stylized images. Our training process adapts curriculum learning strategy to perform efficient, flexible style and model fusion in the generative space. We perform extensive experiments to show the superiority of FISTNet in comparison to existing state-of-the-art methods.

翻译：随着元宇宙、空间计算和生成式AI等新兴技术的蓬勃发展，面部风格迁移应用引起了研究人员和创业爱好者的广泛关注。StyleGAN方法为迁移学习策略铺平了道路，这类策略可降低训练过程对海量数据的依赖。然而，StyleGAN方法存在过拟合倾向，易在面部图像中引入伪影。DualStyleGAN等研究提出了多路径网络方案，但这类网络需要针对特定风格进行训练，无法一次性生成多种面部风格的融合结果。本文提出一种面部图像风格融合网络（FISTNet），利用预训练的多路径风格迁移网络，既解决了训练阶段缺乏海量数据的问题，又能输出多风格融合结果。我们采用预训练的StyleGAN网络，并引入外部风格通路——该通路使用残差调制模块替代变换编码模块。此外，本文提出的门控映射单元有效保留了面部结构、身份特征与细节信息。上述组件使得网络可在极少量数据条件下完成训练，同时生成高质量风格化图像。我们采用课程学习策略，在生成空间内实现高效灵活的样式与模型融合。通过大量实验证明，FISTNet相比现有最优方法具有显著优势。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日