We propose FlowAnchor, a training-free framework for stable and efficient inversion-free, flow-based video editing. Inversion-free editing methods have recently shown impressive efficiency and structure preservation in images by directly steering the sampling trajectory with an editing signal. However, extending this paradigm to videos remains challenging, often failing in multi-object scenes or with increased frame counts. We identify the root cause as the instability of the editing signal in high-dimensional video latent spaces, which arises from imprecise spatial localization and length-induced magnitude attenuation. To overcome this challenge, FlowAnchor explicitly anchors both where to edit and how strongly to edit. It introduces Spatial-aware Attention Refinement, which enforces consistent alignment between textual guidance and spatial regions, and Adaptive Magnitude Modulation, which adaptively preserves sufficient editing strength. Together, these mechanisms stabilize the editing signal and guide the flow-based evolution toward the desired target distribution. Extensive experiments demonstrate that FlowAnchor achieves more faithful, temporally coherent, and computationally efficient video editing across challenging multi-object and fast-motion scenarios. The project page is available at https://cuc-mipg.github.io/FlowAnchor.github.io/.
翻译:摘要:本文提出FlowAnchor,一种无需训练即可实现稳定高效免反演流式视频编辑的框架。免反演编辑方法近期在图像领域通过直接利用编辑信号引导采样轨迹,展现出显著的效率与结构保持能力。然而,将该范式扩展至视频领域仍面临挑战——在多目标场景或帧数增加时往往失效。我们将其根源归结为高维视频潜在空间中编辑信号的不稳定性,这源于空间定位不精确与长度引起的幅值衰减。为克服这一难题,FlowAnchor显式锚定编辑位置与编辑强度。其引入空间感知注意力精炼模块,强制文本引导与空间区域间的语义对齐;同时提出自适应幅值调制模块,自适应保持充足编辑强度。二者协同作用稳定编辑信号,引导流式演化过程趋向目标分布。广泛实验表明,FlowAnchor在复杂多目标与快速运动场景中,能实现更忠实、时序连贯且计算高效的视频编辑。项目页面见 https://cuc-mipg.github.io/FlowAnchor.github.io/。