Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i) the ambiguous contents hinder the identification of quality-determined regions. (ii) the diverse and complicated hybrid distortions are hard to distinguish. To tackle the above challenges and assist in the development of short-form videos, we establish the first large-scale Kaleidoscope short Video database for Quality assessment, termed KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos through the diverse practical processing workflows, including pre-processing, transcoding, and enhancement. Among them, the absolute quality score of each video and partial ranking score among indistinguishable samples are provided by a team of professional researchers specializing in image processing. Based on this database, we propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models (i.e., CLIP) and distinguish the distortions with the distortion understanding module. Experimental results have shown the effectiveness of KSVQE on our KVQ database and popular VQA databases.
翻译:摘要:短视频UGC平台(如快手和抖音)已成为新兴且不可替代的主流媒体形式,其蓬勃发展得益于用户友好的互动模式和万花筒般的创作方式等。然而,不断演进的内容生成模式(如特效)和复杂的处理流程(如去伪影)给当前UGC视频质量评估带来了重大挑战:(i)模糊的内容阻碍了对质量关键区域的识别;(ii)多样且复杂的混合失真难以区分。为应对上述挑战并推动短视频发展,我们构建了首个大规模短视频万花筒式质量评估数据库KVQ,包含600个用户上传的短视频及通过多种实际处理流程(包括预处理、转码和增强)生成的3600个处理视频。其中,每个视频的绝对质量分数以及不可区分样本间的部分排序分数由专注图像处理的专业研究人员团队提供。基于该数据库,我们提出了首个短视频质量评估器KSVQE,该评估器利用大型视觉语言模型(如CLIP)的内容理解能力识别质量关键语义,并借助失真理解模块区分失真类型。实验结果表明,KSVQE在KVQ数据库及主流VQA数据库上均具有有效性。