HL-OutPaint: Coarse-to-Fine Video Outpainting for High-Resolution Long-Range Videos

Video outpainting generates plausible visual content beyond the original spatial extent of a video, playing a key role in adapting videos to diverse display formats. To support such use cases, it must enable large spatial extrapolation over long sequences. However, most existing methods address only one of these challenges or lack explicit mechanisms for ensuring global spatio-temporal consistency, leading to notable limitations. In this paper, we propose HL-OutPaint, a high-resolution video outpainting framework for long sequences. Our approach follows a coarse-to-fine strategy with a two-stage pipeline. We first construct Global Coarse Guidance (GCG), a low-resolution representation that captures global structure and dominant motion across the video. Unlike naive downsampling, GCG is built via a novel global-local frame swapping mechanism that couples sparse global keyframes with local temporal windows and exchanges information during sampling. This enables GCG to encode both long-term structural consistency and short-term temporal dynamics in a unified representation. Guided by this representation, HL-OutPaint then performs high-resolution outpainting to generate spatially detailed and temporally consistent content. By separating global structure modeling from fine-grained synthesis, our framework achieves stable, coherent generation for large spatial expansion and long video sequences. Extensive experiments show that HL-OutPaint outperforms existing methods in challenging scenarios involving wide spatial extrapolation and long video sequences.

翻译：视频外扩技术在原始视频空间范围之外生成合理的视觉内容，对于将视频适配至多种显示格式起着关键作用。为支持此类应用场景，该技术必须能够对长序列实现大幅度的空间外推。然而现有方法大多仅能应对上述挑战之一，或缺乏确保全局时空一致性的显式机制，从而存在显著局限性。本文提出HL-OutPaint——一种面向长序列的高分辨率视频外扩框架。该方法采用由粗到精策略，通过两阶段流水线实现：首先构建全局粗粒度引导（GCG），这是一种捕获视频全局结构与主导运动的低分辨率表征。与简单下采样不同，GCG通过一种新颖的全局-局部帧交换机制构建，该机制将稀疏的全局关键帧与局部时间窗口耦合，并在采样过程中交换信息。这使得GCG能够在统一表征中同时编码长期结构一致性与短期时域动态。在此表征引导下，HL-OutPaint进而执行高分辨率外扩，生成空间细节丰富且时域一致的视频内容。通过将全局结构建模与细粒度合成相分离，本框架能够针对大范围空间扩展和长视频序列实现稳定且连贯的生成。大量实验表明，在涉及大幅空间外推与长视频序列的挑战性场景中，HL-OutPaint优于现有方法。