Recent video inpainting methods often employ image-to-video (I2V) priors to model temporal consistency across masked frames. While effective in moderate cases, these methods struggle under severe content degradation and tend to overlook spatiotemporal stability, resulting in insufficient control over the latter parts of the video. To address these limitations, we decouple video inpainting into two sub-tasks: multi-frame consistent image inpainting and masked area motion propagation. We propose VidSplice, a novel framework that introduces spaced-frame priors to guide the inpainting process with spatiotemporal cues. To enhance spatial coherence, we design a CoSpliced Module to perform first-frame propagation strategy that diffuses the initial frame content into subsequent reference frames through a splicing mechanism. Additionally, we introduce a delicate context controller module that encodes coherent priors after frame duplication and injects the spliced video into the I2V generative backbone, effectively constraining content distortion during generation. Extensive evaluations demonstrate that VidSplice achieves competitive performance across diverse video inpainting scenarios. Moreover, its design significantly improves both foreground alignment and motion stability, outperforming existing approaches.
翻译:现有的视频修复方法通常利用图像到视频(I2V)先验来建模被遮挡帧之间的时序一致性。尽管在一般场景下有效,这些方法在内容严重退化时表现不佳,且往往忽略时空稳定性,导致对视频后半部分的控制不足。为解决这些局限性,我们将视频修复解耦为两个子任务:多帧一致的图像修复与遮挡区域运动传播。本文提出VidSplice这一新颖框架,通过引入间隔帧先验,利用时空线索引导修复过程。为增强空间连贯性,我们设计了协同拼接模块,采用首帧传播策略,通过拼接机制将初始帧内容扩散至后续参考帧。此外,我们引入精密的上下文控制模块,在帧复制后编码连贯先验,并将拼接视频注入I2V生成主干网络,有效约束生成过程中的内容畸变。大量实验评估表明,VidSplice在多样化视频修复场景中均取得具有竞争力的性能。其设计显著提升了前景对齐度与运动稳定性,性能优于现有方法。