ICDaeLST: Intensity-Controllable Detail Attention-enhanced for Lightweight Fast Style Transfer

The mainstream style transfer methods usually use pre-trained deep convolutional neural network (VGG) models as encoders, or use more complex model structures to achieve better style transfer effects. This leads to extremely slow processing speeds for practical tasks due to limited resources or higher resolution image processing, such as 4K images, severely hindering the practical application value of style transfer models. We introduce a lightweight and fast styletransfer model with controllable detail attention enhancement, named ICDaeLST. The model adopts a minimal, shallow, and small architecture, forming a very compact lightweight model for efficient forward inference. Although its structure is simple and has limited parameters, we achieve better overall color and texture structure matching by introducing a style discriminator, design additional global semantic invariance loss to preserve the semantic and structural information of the content image from a high-level global perspective, and design a shallow detail attention enhancement module to preserve the detail information of the content image from a low-level detail perspective. We also achieve controllable intensity during inference for the first time (adjusting the degree of detail retention and texture structure transfer based on subjective judgment) to meet different users' subjective evaluation of stylization effects. Compared with the current best-performing and most lightweight models, our model achieves better style transfer quality and better content structure and detail retention, while having a smaller model size (17-250 times smaller) and faster speed (0.26-6.5 times faster), and achieves the fastest processing speed of 0.38s on 4K high-resolution images.

翻译：主流风格迁移方法通常采用预训练的深度卷积神经网络（VGG）模型作为编码器，或使用更复杂的模型结构来获得更好的风格迁移效果。这导致在实际任务中由于资源受限或处理高分辨率图像（如4K图像）时处理速度极慢，严重阻碍了风格迁移模型的实际应用价值。我们提出了一种具有可控细节注意力增强的轻量级快速风格迁移模型ICDaeLST。该模型采用极小、浅层且紧凑的架构，形成非常精简的轻量级模型，可进行高效的前向推理。尽管结构简单且参数量有限，我们通过引入风格判别器实现更好的整体色彩与纹理结构匹配；设计了额外的全局语义不变性损失，从高层全局视角保留内容图像的语义与结构信息；并设计了浅层细节注意力增强模块，从低层细节视角保留内容图像的细节信息。我们还首次实现了推理过程中的强度可控（根据主观判断调整细节保留程度与纹理结构迁移强度），以适应不同用户对风格化效果的主观评价。与当前性能最优且最轻量的模型相比，我们的模型在实现更高质量风格迁移及更好的内容结构与细节保留的同时，模型尺寸缩小了17-250倍，速度提升了0.26-6.5倍，且在4K高分辨率图像上实现了0.38秒的最快处理速度。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日