ArtFusion: Controllable Arbitrary Style Transfer using Dual Conditional Latent Diffusion Models

Arbitrary Style Transfer (AST) aims to transform images by adopting the style from any selected artwork. Nonetheless, the need to accommodate diverse and subjective user preferences poses a significant challenge. While some users wish to preserve distinct content structures, others might favor a more pronounced stylization. Despite advances in feed-forward AST methods, their limited customizability hinders their practical application. We propose a new approach, ArtFusion, which provides a flexible balance between content and style. In contrast to traditional methods reliant on biased similarity losses, ArtFusion utilizes our innovative Dual Conditional Latent Diffusion Probabilistic Models (Dual-cLDM). This approach mitigates repetitive patterns and enhances subtle artistic aspects like brush strokes and genre-specific features. Despite the promising results of conditional diffusion probabilistic models (cDM) in various generative tasks, their introduction to style transfer is challenging due to the requirement for paired training data. ArtFusion successfully navigates this issue, offering more practical and controllable stylization. A key element of our approach involves using a single image for both content and style during model training, all the while maintaining effective stylization during inference. ArtFusion outperforms existing approaches on outstanding controllability and faithful presentation of artistic details, providing evidence of its superior style transfer capabilities. Furthermore, the Dual-cLDM utilized in ArtFusion carries the potential for a variety of complex multi-condition generative tasks, thus greatly broadening the impact of our research.

翻译：任意风格迁移旨在通过采纳任何选中艺术作品的风格来转换图像。然而，满足多样化且主观的用户偏好构成了重大挑战。尽管部分用户希望保留清晰的内容结构，另一些用户则更偏好鲜明的风格化处理。尽管前馈式风格迁移方法取得了进展，其有限的可定制性制约了实际应用。我们提出了一种新方法ArtFusion，它在内容与风格之间提供了灵活的平衡。与传统依赖有偏相似性损失的方法不同，ArtFusion利用我们创新的双条件潜扩散概率模型。该方法缓解了重复模式并增强了微妙的艺术元素，如笔触和流派特定特征。尽管条件扩散概率模型在各种生成任务中展现了令人期待的结果，但由于需要成对训练数据，将其引入风格迁移面临挑战。ArtFusion成功解决了这一问题，提供了更实用且可控的风格化处理。我们方法的关键要素在于训练阶段使用单一图像同时作为内容和风格来源，同时在推理阶段保持有效的风格化效果。在卓越的可控性和艺术细节的真实呈现方面，ArtFusion超越了现有方法，证明了其优越的风格迁移能力。此外，ArtFusion采用的双条件潜扩散模型具有适用于多种复杂多条件生成任务的潜力，从而极大地拓展了我们研究的影响力。