Flow-based propagation and spatiotemporal Transformer are two mainstream mechanisms in video inpainting (VI). Despite the effectiveness of these components, they still suffer from some limitations that affect their performance. Previous propagation-based approaches are performed separately either in the image or feature domain. Global image propagation isolated from learning may cause spatial misalignment due to inaccurate optical flow. Moreover, memory or computational constraints limit the temporal range of feature propagation and video Transformer, preventing exploration of correspondence information from distant frames. To address these issues, we propose an improved framework, called ProPainter, which involves enhanced ProPagation and an efficient Transformer. Specifically, we introduce dual-domain propagation that combines the advantages of image and feature warping, exploiting global correspondences reliably. We also propose a mask-guided sparse video Transformer, which achieves high efficiency by discarding unnecessary and redundant tokens. With these components, ProPainter outperforms prior arts by a large margin of 1.46 dB in PSNR while maintaining appealing efficiency.
翻译:基于光流的传播机制和时空Transformer是视频修复中的两种主流机制。尽管这些组件效果显著,但仍存在若干限制性能的缺陷。以往的传播方法仅在图像域或特征域中独立执行,而脱离学习的全局图像传播可能因光流不准确导致空间错位。此外,内存或计算约束限制了特征传播与视频Transformer的时间范围,阻碍了从远距离帧中挖掘对应关系信息。为解决这些问题,我们提出一种改进框架ProPainter,该框架融合了增强型传播机制与高效Transformer。具体而言,我们引入结合图像与特征扭曲优势的双域传播方法,可靠地利用全局对应关系。同时提出掩码引导的稀疏视频Transformer,通过丢弃冗余标记实现高效计算。基于这些组件,ProPainter在PSNR指标上以1.46 dB的显著优势超越现有方法,同时保持了令人满意的效率。