Video Quality Assessment (VQA) is vital for large-scale video retrieval systems, aimed at identifying quality issues to prioritize high-quality videos. In industrial systems, low-quality video characteristics fall into four categories: visual-related issues like mosaics and black boxes, textual issues from video titles and OCR content, and semantic issues like frame incoherence and frame-text mismatch from AI-generated videos. Despite their prevalence in industrial settings, these low-quality videos have been largely overlooked in academic research, posing a challenge for accurate identification. To address this, we introduce the Multi-Branch Collaborative Network (MBCN) tailored for industrial video retrieval systems. MBCN features four branches, each designed to tackle one of the aforementioned quality issues. After each branch independently scores videos, we aggregate these scores using a weighted approach and a squeeze-and-excitation mechanism to dynamically address quality issues across different scenarios. We implement point-wise and pair-wise optimization objectives to ensure score stability and reasonableness. Extensive offline and online experiments on a world-level video search engine demonstrate MBCN's effectiveness in identifying video quality issues, significantly enhancing the retrieval system's ranking performance. Detailed experimental analyses confirm the positive contribution of all four evaluation branches. Furthermore, MBCN significantly improves recognition accuracy for low-quality AI-generated videos compared to the baseline.
翻译:视频质量评估(VQA)在大规模视频检索系统中至关重要,其目的在于识别质量问题以优先展示高质量视频。在工业系统中,低质量视频特征可分为四类:视觉相关问题(如马赛克和黑框)、源自视频标题与OCR内容的文本问题,以及由AI生成视频导致的语义问题(如帧间不连贯和图文不匹配)。尽管此类低质量视频在工业场景中普遍存在,但学术研究长期忽视其特性,这为精确识别带来了挑战。为此,我们提出了专为工业视频检索系统设计的多分支协同网络(MBCN)。MBCN包含四个分支,每个分支专门处理上述某一类质量问题。各分支独立完成视频评分后,我们采用加权聚合与压缩激励机制动态整合评分结果,以适应不同场景下的质量问题处理需求。通过实施点对点和配对优化目标,确保评分结果的稳定性与合理性。基于世界级视频搜索引擎开展的离线与在线实验表明,MBCN能有效识别视频质量问题,显著提升检索系统的排序性能。详尽的实验分析证实了所有四个评估分支的积极贡献。此外,与基线方法相比,MBCN对低质量AI生成视频的识别准确率实现了显著提升。