We introduce Artistic Cinemagraph, a fully automated method for creating cinemagraphs from text descriptions - an especially challenging task when prompts feature imaginary elements and artistic styles, given the complexity of interpreting the semantics and motions of these images. Existing single-image animation methods fall short on artistic inputs, and recent text-based video methods frequently introduce temporal inconsistencies, struggling to keep certain regions static. To address these challenges, we propose an idea of synthesizing image twins from a single text prompt - a pair of an artistic image and its pixel-aligned corresponding natural-looking twin. While the artistic image depicts the style and appearance detailed in our text prompt, the realistic counterpart greatly simplifies layout and motion analysis. Leveraging existing natural image and video datasets, we can accurately segment the realistic image and predict plausible motion given the semantic information. The predicted motion can then be transferred to the artistic image to create the final cinemagraph. Our method outperforms existing approaches in creating cinemagraphs for natural landscapes as well as artistic and other-worldly scenes, as validated by automated metrics and user studies. Finally, we demonstrate two extensions: animating existing paintings and controlling motion directions using text.
翻译:我们提出“艺术动态照片”(Artistic Cinemagraph),一种完全自动化的方法,可根据文本描述生成动态照片——这是一项极具挑战性的任务,尤其当提示包含虚构元素和艺术风格时,因为需要解读这些图像的语义与运动。现有单图像动画方法在艺术输入上表现不佳,而近期基于文本的视频方法常引入时间不一致性,难以维持特定区域的静止。为应对这些挑战,我们提出从单一文本提示合成“图像孪生体”的思路:即生成一对艺术图像与其像素对齐的自然外观孪生体。艺术图像展现文本提示中详细描述的风格与外观,而真实感对应物则极大简化了布局与运动分析。利用现有自然图像与视频数据集,我们可精确分割真实感图像,并根据语义信息预测合理运动。预测的运动随后可迁移至艺术图像,以生成最终动态照片。通过自动评估指标与用户研究验证,我们的方法在自然景观以及艺术及超现实场景的动态照片生成中均优于现有方法。最后,我们展示两项扩展:对现有画作进行动画化,以及利用文本控制运动方向。