Manually navigating lengthy videos to seek information or answer questions can be a tedious and time-consuming task for users. We introduce StoryNavi, a novel system powered by VLLMs for generating customised video play experiences by retrieving materials from original videos. It directly answers users' query by constructing non-linear sequence with identified relevant clips to form a cohesive narrative. StoryNavi offers two modes of playback of the constructed video plays: 1) video-centric, which plays original audio and skips irrelevant segments, and 2) narrative-centric, narration guides the experience, and the original audio is muted. Our technical evaluation showed adequate retrieval performance compared to human retrieval. Our user evaluation shows that maintaining narrative coherence significantly enhances user engagement when viewing disjointed video segments. However, factors like video genre, content, and the query itself may lead to varying user preferences for the playback mode.
翻译:对于用户而言,手动浏览冗长视频以寻求信息或回答问题可能是一项繁琐且耗时的任务。我们介绍StoryNavi,这是一个由VLLM驱动的新型系统,通过从原始视频中检索素材来生成定制化的视频播放体验。它通过构建非线性序列,将识别出的相关片段组合成一个连贯的叙事,从而直接回应用户的查询。StoryNavi为构建的视频播放提供两种播放模式:1)以视频为中心的模式,播放原始音频并跳过不相关片段;2)以叙事为中心的模式,由旁白引导体验,同时静音原始音频。我们的技术评估表明,其检索性能与人工检索相比表现良好。我们的用户评估显示,在观看不连贯的视频片段时,保持叙事连贯性能显著提升用户参与度。然而,视频类型、内容以及查询本身等因素可能导致用户对播放模式的偏好存在差异。