In recent years, language-driven artistic style transfer has emerged as a new type of style transfer technique, eliminating the need for a reference style image by using natural language descriptions of the style. The first model to achieve this, called CLIPstyler, has demonstrated impressive stylisation results. However, its lengthy optimisation procedure at runtime for each query limits its suitability for many practical applications. In this work, we present FastCLIPstyler, a generalised text-based image style transfer model capable of stylising images in a single forward pass for arbitrary text inputs. Furthermore, we introduce EdgeCLIPstyler, a lightweight model designed for compatibility with resource-constrained devices. Through quantitative and qualitative comparisons with state-of-the-art approaches, we demonstrate that our models achieve superior stylisation quality based on measurable metrics while offering significantly improved runtime efficiency, particularly on edge devices.
翻译:近年来,语言驱动的艺术风格迁移作为一种新型风格迁移技术兴起,它通过使用风格的自然语言描述来替代对参考风格图像的需求。首个实现该技术的模型CLIPstyler展示了令人印象深刻的风格化效果,但其在每次查询时冗长的运行时优化过程限制了其在许多实际应用中的适用性。本文提出FastCLIPstyler——一种泛化的基于文本的图像风格迁移模型,能够针对任意文本输入通过单次前向传播完成图像风格化。此外,我们引入了专为资源受限设备设计的轻量级模型EdgeCLIPstyler。通过与现有最优方法进行定性和定量比较,我们证明所提模型在可度量指标上实现了更优的风格化质量,同时显著提升了运行效率,尤其在边缘设备上表现突出。