Generating novel renderings of a scene along user-defined camera trajectories from a single monocular video, dubbed video retaking, is a compelling but difficult problem in content creation and visual effects. Existing geometry-guided approaches reconstruct a 4D representation from the source video and render it along the target trajectory to condition video diffusion models. However, this guidance degrades as the target camera departs from the source trajectory, leaving newly revealed regions sparse or entirely missing. We propose SierpinskiCam, which addresses this limitation by augmenting geometry-based guidance with Sierpinski dome texture cues that contains rich trackable features even under large viewpoint changes. We further introduce a reference video conditioning mechanism that appends source-video tokens to the target-token sequence and separates the two streams with negative RoPE indices, enabling appearance grounding without architectural modification or per-video adaptation. Extensive experiments show that SierpinskiCam achieves significant gains in camera controllability, geometric consistency, and video quality across diverse and challenging retaking scenarios. Project page: https://hyelinnam.github.io/SierpinskiCam/.
翻译:从单个单目视频沿用户定义的相机轨迹生成场景的新颖渲染,称为视频重拍,是内容创作和视觉效果领域中一个引人注目但困难的问题。现有的几何引导方法从源视频重建4D表示,并沿目标轨迹进行渲染,以调节视频扩散模型。然而,当目标相机偏离源轨迹时,这种引导会退化,导致新暴露区域稀疏或完全缺失。我们提出SierpinskiCam,通过使用谢尔宾斯基穹顶纹理线索增强基于几何的引导来应对这一限制,该线索即使在大视角变化下也包含丰富的可跟踪特征。我们进一步引入一种参考视频调节机制,将源视频令牌附加到目标令牌序列,并用负RoPE索引分隔两个流,从而无需架构修改或逐视频适配即可实现外观锚定。大量实验表明,SierpinskiCam在多种具有挑战性的重拍场景中,在相机可控性、几何一致性和视频质量方面取得了显著提升。项目页面:https://hyelinnam.github.io/SierpinskiCam/。