We present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field aggregating the static contents in the entire video and a temporal deformation field recording the transformations from the canonical image (i.e., rendered from the canonical content field) to each individual frame along the time axis.Given a target video, these two fields are jointly optimized to reconstruct it through a carefully tailored rendering pipeline.We advisedly introduce some regularizations into the optimization process, urging the canonical content field to inherit semantics (e.g., the object shape) from the video.With such a design, CoDeF naturally supports lifting image algorithms for video processing, in the sense that one can apply an image algorithm to the canonical image and effortlessly propagate the outcomes to the entire video with the aid of the temporal deformation field.We experimentally show that CoDeF is able to lift image-to-image translation to video-to-video translation and lift keypoint detection to keypoint tracking without any training.More importantly, thanks to our lifting strategy that deploys the algorithms on only one image, we achieve superior cross-frame consistency in processed videos compared to existing video-to-video translation approaches, and even manage to track non-rigid objects like water and smog.Project page can be found at https://qiuyu96.github.io/CoDeF/.
翻译:我们提出内容变形场CoDeF作为一种新型视频表示,它由聚合整个视频静态内容的规范内容场和记录规范图像(即从规范内容场渲染得到)沿时间轴到各帧变换的时间变形场组成。针对给定目标视频,这两个场通过精心设计的渲染管线联合优化以实现视频重建。我们策略性地在优化过程中引入正则化项,促使规范内容场继承视频的语义(如物体形状)。基于这一设计,CoDeF天然支持将图像算法提升应用于视频处理——即对规范图像应用图像算法后,借助时间变形场可将结果无缝传播至整个视频。实验表明,CoDeF无需训练即可将图像到图像转换提升为视频到视频转换、将关键点检测提升为关键点跟踪。更重要的是,由于我们的提升策略仅需对单张图像部署算法,相比现有视频到视频转换方法,处理后的视频实现了更优异的跨帧一致性,甚至能够追踪水雾等非刚性物体。项目页面见https://qiuyu96.github.io/CoDeF/。