Neural style transfer (NST) has evolved significantly in recent years. Yet, despite its rapid progress and advancement, existing NST methods either struggle to transfer aesthetic information from a style effectively or suffer from high computational costs and inefficiencies in feature disentanglement due to using pre-trained models. This work proposes a lightweight but effective model, AesFA -- Aesthetic Feature-Aware NST. The primary idea is to decompose the image via its frequencies to better disentangle aesthetic styles from the reference image while training the entire model in an end-to-end manner to exclude pre-trained models at inference completely. To improve the network's ability to extract more distinct representations and further enhance the stylization quality, this work introduces a new aesthetic feature: contrastive loss. Extensive experiments and ablations show the approach not only outperforms recent NST methods in terms of stylization quality, but it also achieves faster inference. Codes are available at https://github.com/Sooyyoungg/AesFA.
翻译:神经风格迁移(NST)近年来取得了显著进展。然而,尽管其发展迅速且日益成熟,现有NST方法要么难以有效迁移风格中的美学信息,要么因使用预训练模型而导致计算成本高昂且特征解耦效率低下。本文提出一种轻量级但高效的模型——AesFA(美学特征感知NST)。其核心思想是通过频率分解图像,以更好地从参考图像中解耦美学风格,同时以端到端方式训练整个模型,从而在推理阶段完全排除预训练模型。为提升网络提取更独特表征的能力并进一步增强风格化质量,本文引入了一种新的美学特征:对比损失。大量实验与消融研究表明,该方法不仅在风格化质量上优于近期NST方法,而且实现了更快的推理速度。代码见https://github.com/Sooyyoungg/AesFA。