We present StyleMamba, an efficient image style transfer framework that translates text prompts into corresponding visual styles while preserving the content integrity of the original images. Existing text-guided stylization requires hundreds of training iterations and takes a lot of computing resources. To speed up the process, we propose a conditional State Space Model for Efficient Text-driven Image Style Transfer, dubbed StyleMamba, that sequentially aligns the image features to the target text prompts. To enhance the local and global style consistency between text and image, we propose masked and second-order directional losses to optimize the stylization direction to significantly reduce the training iterations by 5 times and the inference time by 3 times. Extensive experiments and qualitative evaluation confirm the robust and superior stylization performance of our methods compared to the existing baselines.
翻译:我们提出StyleMamba——一种高效的图像风格迁移框架,能够将文本提示转换为对应的视觉风格,同时保持原始图像的内容完整性。现有文本引导风格化方法需要数百次训练迭代且消耗大量计算资源。为加速这一过程,我们提出基于条件状态空间模型的高效文本驱动图像风格迁移框架StyleMamba,通过序列化对齐图像特征与目标文本提示。为增强文本与图像之间的局部与全局风格一致性,我们引入掩码二阶方向损失函数优化风格化方向,从而将训练迭代次数减少5倍、推理时间减少3倍。大量实验与定性评估证明,与现有基线方法相比,本方法具有鲁棒且优越的风格化性能。