Everyone can write their stories in freeform text format -- it's something we all learn in school. Yet storytelling via video requires one to learn specialized and complicated tools. In this paper, we introduce Doki, a text-native interface for generative video authoring, aligning video creation with the natural process of text writing. In Doki, writing text is the primary interaction: within a single document, users define assets, structure scenes, create shots, refine edits, and add audio. We articulate the design principles of this text-first approach and demonstrate Doki's capabilities through a series of examples. To evaluate its real-world use, we conducted a week-long deployment study with participants of varying expertise in video authoring. This work contributes a fundamental shift in generative video interfaces, demonstrating a powerful and accessible new way to craft visual stories.
翻译:每个人都能以自由文本格式书写自己的故事——这是我们在学校都学习过的技能。然而通过视频进行叙事却需要掌握专业且复杂的工具。本文介绍Doki,一种基于文本原生的生成式视频创作界面,它将视频创作与文本书写的自然过程相融合。在Doki中,文本写作是核心交互方式:用户可在单一文档内定义素材、构建场景、创建镜头、精编剪辑并添加音频。我们系统阐述了这种文本优先方法的设计原则,并通过系列案例展示了Doki的功能。为评估其实际应用效果,我们开展了为期一周的部署研究,参与者涵盖不同视频创作经验水平。这项工作推动了生成式视频界面的根本性变革,展示了一种强大且易用的视觉叙事创作新范式。