Recent advances in style and appearance transfer are impressive, but most methods isolate global style and local appearance transfer, neglecting semantic correspondence. Additionally, image and video tasks are typically handled in isolation, with little focus on integrating them for video transfer. To address these limitations, we introduce a novel task, Semantic Style Transfer, which involves transferring style and appearance features from a reference image to a target visual content based on semantic correspondence. We subsequently propose a training-free method, Semantix an energy-guided sampler designed for Semantic Style Transfer that simultaneously guides both style and appearance transfer based on semantic understanding capacity of pre-trained diffusion models. Additionally, as a sampler, Semantix be seamlessly applied to both image and video models, enabling semantic style transfer to be generic across various visual media. Specifically, once inverting both reference and context images or videos to noise space by SDEs, Semantix utilizes a meticulously crafted energy function to guide the sampling process, including three key components: Style Feature Guidance, Spatial Feature Guidance and Semantic Distance as a regularisation term. Experimental results demonstrate that Semantix not only effectively accomplishes the task of semantic style transfer across images and videos, but also surpasses existing state-of-the-art solutions in both fields. The project website is available at https://huiang-he.github.io/semantix/
翻译:近年来风格与外观迁移技术取得了显著进展,但多数方法将全局风格与局部外观迁移割裂处理,忽略了语义对应关系。此外,图像与视频任务通常被孤立处理,鲜有研究关注如何整合二者以实现视频迁移。为突破这些局限,我们提出一项新颖任务——语义风格迁移,其核心在于依据语义对应关系,将参考图像的风格与外观特征迁移至目标视觉内容。我们进一步提出一种免训练方法Semantix,这是一种专为语义风格迁移设计的能量引导采样器,能够基于预训练扩散模型的语义理解能力,同步引导风格与外观迁移。作为采样器,Semantix可无缝应用于图像与视频模型,使得语义风格迁移能够泛化至多种视觉媒介。具体而言,在通过随机微分方程将参考内容与上下文图像(或视频)反演至噪声空间后,Semantix利用精心构建的能量函数引导采样过程,该函数包含三个核心组件:风格特征引导、空间特征引导以及作为正则化项的语义距离。实验结果表明,Semantix不仅在跨图像与视频的语义风格迁移任务中表现卓越,同时在两个领域均超越了现有最先进解决方案。项目网站详见 https://huiang-he.github.io/semantix/