Video creation has become increasingly popular, yet the expertise and effort required for editing often pose barriers to beginners. In this paper, we explore the integration of large language models (LLMs) into the video editing workflow to reduce these barriers. Our design vision is embodied in LAVE, a novel system that provides LLM-powered agent assistance and language-augmented editing features. LAVE automatically generates language descriptions for the user's footage, serving as the foundation for enabling the LLM to process videos and assist in editing tasks. When the user provides editing objectives, the agent plans and executes relevant actions to fulfill them. Moreover, LAVE allows users to edit videos through either the agent or direct UI manipulation, providing flexibility and enabling manual refinement of agent actions. Our user study, which included eight participants ranging from novices to proficient editors, demonstrated LAVE's effectiveness. The results also shed light on user perceptions of the proposed LLM-assisted editing paradigm and its impact on users' creativity and sense of co-creation. Based on these findings, we propose design implications to inform the future development of agent-assisted content editing.
翻译:视频创作日益普及,但编辑所需专业知识和投入的工作量常对初学者构成障碍。本文探索将大语言模型(LLM)集成到视频编辑工作流中以消除这些障碍。我们的设计理念体现在LAVE系统中——一种提供LLM驱动的智能体辅助与语言增强编辑功能的新型系统。LAVE能自动为用户镜头生成语言描述,为LLM处理视频并协助编辑任务奠定基础。当用户提供编辑目标时,智能体规划并执行相应操作以达成目标。此外,LAVE支持用户通过智能体或直接界面操作两种方式进行视频编辑,既提供灵活性又允许对智能体操作进行手动优化。包含八名从新手到熟练编者参与的用户研究证明了LAVE的有效性。实验结果还揭示了用户对所提出的LLM辅助编辑范式的认知,及其对用户创造力和协同创作体验的影响。基于这些发现,我们提出设计启示以指导未来智能体辅助内容编辑的发展方向。