Designing stylized cinemagraphs is challenging due to the difficulty in customizing complex and expressive flow elements. To achieve intuitive and detailed control of the generated cinemagraphs, sketches provide a feasible solution to convey personalized design requirements beyond text inputs. In this paper, we propose Sketch2Cinemagraph, a sketch-guided framework that enables the conditional generation of stylized cinemagraphs from freehand sketches. Sketch2Cinemagraph adopts text prompts for initial landscape generation and provides sketch controls for both spatial and motion cues. The latent diffusion model first generates target stylized landscape images along with realistic versions. Then, a pre-trained object detection model obtains masks for the flow regions. We propose a latent motion diffusion model to estimate motion field in fluid regions of the generated landscape images. The input motion sketches serve as the conditions to control the generated motion fields in the masked fluid regions with the prompt. To synthesize cinemagraph frames, the pixels within fluid regions are warped to target locations at each timestep using a U-Net based frame generator. The results verified that Sketch2Cinemagraph can generate aesthetically appealing stylized cinemagraphs with continuous temporal flow from sketch inputs. We showcase the advantages of Sketch2Cinemagraph through qualitative and quantitative comparisons against the state-of-the-art approaches.
翻译:设计风格化动态摄影作品具有挑战性,原因在于定制复杂且富有表现力的流动元素较为困难。为实现对生成动态摄影作品的直观精细控制,草图提供了一种超越文本输入的可行方案,用以传达个性化设计需求。本文提出Sketch2Cinemagraph,一个草图引导的框架,能够根据手绘草图条件生成风格化动态摄影作品。该框架采用文本提示进行初始景观生成,并通过草图控制提供空间与运动线索。潜在扩散模型首先生成目标风格化景观图像及其写实版本。随后,预训练的目标检测模型获取流动区域掩码。我们提出一种潜在运动扩散模型,用于估计生成景观图像中流体区域的运动场。输入的运动草图作为条件,结合提示词控制掩码流体区域内生成的运动场。为合成动态摄影帧序列,基于U-Net的帧生成器在每个时间步将流体区域内的像素扭曲至目标位置。实验结果验证了Sketch2Cinemagraph能够根据草图输入生成具有连续时序流动的美学风格化动态摄影作品。我们通过定性与定量对比,展示了该方法相较于前沿技术的优势。