Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos. To address this, we propose a novel optimization-based framework for sketching videos represented by the frame-wise B\'ezier curve. In detail, we first propose a cross-frame stroke initialization approach to warm up the location and the width of each curve. Then, we optimize the locations of these curves by utilizing a semantic loss based on CLIP features and a newly designed consistency loss using the self-decomposed 2D atlas network. Built upon these design elements, the resulting sketch video showcases impressive visual abstraction and temporal coherence. Furthermore, by transforming a video into SVG lines through the sketching process, our method unlocks applications in sketch-based video editing and video doodling, enabled through video composition, as exemplified in the teaser.
翻译:理解语义细节和高层概念对于图像素描生成至关重要,而这一挑战在视频领域中更为艰巨。为此,我们提出了一种基于优化的新型框架,用于合成由逐帧贝塞尔曲线表示的素描视频。具体而言,我们首先提出了一种跨帧笔画初始化方法,以优化每条曲线的位置和宽度。接着,我们利用基于CLIP特征的语义损失函数以及通过自分解二维图谱网络新设计的一致性损失函数,对这些曲线的位置进行优化。基于这些设计要素,生成的素描视频展现出令人印象深刻的视觉抽象能力与时间连贯性。此外,通过将视频转化为素描过程中的SVG线条,我们的方法解锁了基于素描的视频编辑和视频涂鸦应用,这些应用通过视频合成实现,如预告片所示。