Image style transfer aims to integrate the visual patterns of a specific artistic style into a content image while preserving its content structure. Existing methods mainly rely on the generative adversarial network (GAN) or stable diffusion (SD). GAN-based approaches using CNNs or Transformers struggle to jointly capture local and global dependencies, leading to artifacts and disharmonious patterns. SD-based methods reduce such issues but often fail to preserve content structures and suffer from slow inference. To address these issues, we revisit GAN and propose a mamba-based generator, termed as StyMam, to produce high-quality stylized images without introducing artifacts and disharmonious patterns. Specifically, we introduce a mamba-based generator with a residual dual-path strip scanning mechanism and a channel-reweighted spatial attention module. The former efficiently captures local texture features, while the latter models global dependencies. Finally, extensive qualitative and quantitative experiments demonstrate that the proposed method outperforms state-of-the-art algorithms in both quality and speed.
翻译:图像风格迁移旨在将特定艺术风格的视觉模式融入内容图像,同时保留其内容结构。现有方法主要依赖于生成对抗网络(GAN)或稳定扩散(SD)。基于GAN的方法使用CNN或Transformer,难以同时捕捉局部与全局依赖关系,导致伪影与不协调的图案。基于SD的方法减少了此类问题,但往往无法有效保留内容结构,且推理速度较慢。为解决这些问题,我们重新审视GAN并提出了一个基于Mamba的生成器,称为StyMam,以生成高质量的风格化图像,同时避免引入伪影与不协调的图案。具体而言,我们引入了一种基于Mamba的生成器,其包含残差双路径条带扫描机制与通道重加权空间注意力模块。前者高效捕捉局部纹理特征,后者则建模全局依赖关系。最后,大量的定性与定量实验表明,所提方法在质量与速度上均优于现有先进算法。