How to automatically transfer the dynamic texture of a given video to the target still image is a challenging and ongoing problem. In this paper, we propose to handle this task via a simple yet effective model that utilizes both PatchMatch and Transformers. The key idea is to decompose the task of dynamic texture transfer into two stages, where the start frame of the target video with the desired dynamic texture is synthesized in the first stage via a distance map guided texture transfer module based on the PatchMatch algorithm. Then, in the second stage, the synthesized image is decomposed into structure-agnostic patches, according to which their corresponding subsequent patches can be predicted by exploiting the powerful capability of Transformers equipped with VQ-VAE for processing long discrete sequences. After getting all those patches, we apply a Gaussian weighted average merging strategy to smoothly assemble them into each frame of the target stylized video. Experimental results demonstrate the effectiveness and superiority of the proposed method in dynamic texture transfer compared to the state of the art.
翻译:如何将给定视频的动态纹理自动迁移至目标静态图像,是一项具有挑战性且持续研究的难题。本文提出一种兼顾简洁性与有效性的模型,该模型结合了PatchMatch算法与Transformer架构。核心思路是将动态纹理迁移任务分解为两个阶段:第一阶段通过基于PatchMatch算法的距离图引导纹理迁移模块,合成具有目标动态纹理的视频起始帧;第二阶段将合成图像解构为结构无关的图块,借助配备VQ-VAE的Transformer在处理长离散序列上的强大能力,预测各图块对应的后续图块。获取所有图块后,采用高斯加权平均融合策略,将其平滑地组装至目标风格化视频的每一帧。实验结果表明,与现有最优方法相比,本方法在动态纹理迁移任务中展现出显著的有效性与优越性。