The mainstream style transfer methods usually use pre-trained deep convolutional neural network (VGG) models as encoders, or use more complex model structures to achieve better style transfer effects. This leads to extremely slow processing speeds for practical tasks due to limited resources or higher resolution image processing, such as 4K images, severely hindering the practical application value of style transfer models. We introduce a lightweight and fast styletransfer model with controllable detail attention enhancement, named ICDaeLST. The model adopts a minimal, shallow, and small architecture, forming a very compact lightweight model for efficient forward inference. Although its structure is simple and has limited parameters, we achieve better overall color and texture structure matching by introducing a style discriminator, design additional global semantic invariance loss to preserve the semantic and structural information of the content image from a high-level global perspective, and design a shallow detail attention enhancement module to preserve the detail information of the content image from a low-level detail perspective. We also achieve controllable intensity during inference for the first time (adjusting the degree of detail retention and texture structure transfer based on subjective judgment) to meet different users' subjective evaluation of stylization effects. Compared with the current best-performing and most lightweight models, our model achieves better style transfer quality and better content structure and detail retention, while having a smaller model size (17-250 times smaller) and faster speed (0.26-6.5 times faster), and achieves the fastest processing speed of 0.38s on 4K high-resolution images.
翻译:主流风格迁移方法通常采用预训练的深度卷积神经网络(VGG)模型作为编码器,或使用更复杂的模型结构来获得更好的风格迁移效果。这导致在实际任务中由于资源受限或处理高分辨率图像(如4K图像)时处理速度极慢,严重阻碍了风格迁移模型的实际应用价值。我们提出了一种具有可控细节注意力增强的轻量级快速风格迁移模型ICDaeLST。该模型采用极小、浅层且紧凑的架构,形成非常精简的轻量级模型,可进行高效的前向推理。尽管结构简单且参数量有限,我们通过引入风格判别器实现更好的整体色彩与纹理结构匹配;设计了额外的全局语义不变性损失,从高层全局视角保留内容图像的语义与结构信息;并设计了浅层细节注意力增强模块,从低层细节视角保留内容图像的细节信息。我们还首次实现了推理过程中的强度可控(根据主观判断调整细节保留程度与纹理结构迁移强度),以适应不同用户对风格化效果的主观评价。与当前性能最优且最轻量的模型相比,我们的模型在实现更高质量风格迁移及更好的内容结构与细节保留的同时,模型尺寸缩小了17-250倍,速度提升了0.26-6.5倍,且在4K高分辨率图像上实现了0.38秒的最快处理速度。