The Video Browser Showdown (VBS) challenges systems to deliver accurate results under strict time constraints. To meet this demand, we present Fusionista2.0, a streamlined video retrieval system optimized for speed and usability. All core modules were re-engineered for efficiency: preprocessing now relies on ffmpeg for fast keyframe extraction, optical character recognition uses Vintern-1B-v3.5 for robust multilingual text recognition, and automatic speech recognition employs faster-whisper for real-time transcription. For question answering, lightweight vision-language models provide quick responses without the heavy cost of large models. Beyond these technical upgrades, Fusionista2.0 introduces a redesigned user interface with improved responsiveness, accessibility, and workflow efficiency, enabling even non-expert users to retrieve relevant content rapidly. Evaluations demonstrate that retrieval time was reduced by up to 75% while accuracy and user satisfaction both increased, confirming Fusionista2.0 as a competitive and user-friendly system for large-scale video search.
翻译:视频浏览器竞赛(VBS)要求系统在严格的时间限制下提供准确的检索结果。为满足这一需求,我们提出了Fusionista2.0——一个为速度和易用性优化的精简视频检索系统。所有核心模块均针对效率进行了重构:预处理环节现采用ffmpeg进行快速关键帧提取;光学字符识别使用Vintern-1B-v3.5实现鲁棒的多语言文本识别;自动语音识别则采用faster-whisper完成实时转录。在问答模块中,轻量级视觉-语言模型可在避免大型模型高昂计算成本的同时提供快速响应。除技术升级外,Fusionista2.0还重新设计了用户界面,显著提升了响应速度、可访问性与工作流效率,使非专业用户也能快速检索相关内容。评估结果表明,系统检索时间最高减少75%,同时准确率与用户满意度均得到提升,证实了Fusionista2.0作为大规模视频检索系统兼具竞争力与用户友好性。