Short-form UGC video platforms, like Kwai and TikTok, have been an emerging and irreplaceable mainstream media form, thriving on user-friendly engagement, and kaleidoscope creation, etc. However, the advancing content-generation modes, e.g., special effects, and sophisticated processing workflows, e.g., de-artifacts, have introduced significant challenges to recent UGC video quality assessment: (i) the ambiguous contents hinder the identification of quality-determined regions. (ii) the diverse and complicated hybrid distortions are hard to distinguish. To tackle the above challenges and assist in the development of short-form videos, we establish the first large-scale Kaleidoscope short Video database for Quality assessment, termed KVQ, which comprises 600 user-uploaded short videos and 3600 processed videos through the diverse practical processing workflows, including pre-processing, transcoding, and enhancement. Among them, the absolute quality score of each video and partial ranking score among indistinguishable samples are provided by a team of professional researchers specializing in image processing. Based on this database, we propose the first short-form video quality evaluator, i.e., KSVQE, which enables the quality evaluator to identify the quality-determined semantics with the content understanding of large vision language models (i.e., CLIP) and distinguish the distortions with the distortion understanding module. Experimental results have shown the effectiveness of KSVQE on our KVQ database and popular VQA databases.
翻译:短视频UGC平台(如快手、抖音)已成为新兴且不可替代的主流媒体形式,其蓬勃发展得益于用户友好的互动方式与百花齐放的内容创作等特性。然而,先进的内容生成模式(如特效)和复杂的处理流程(如去伪影)给当前UGC视频质量评估带来了重大挑战:(i) 模糊的内容阻碍了对质量决定区域的识别;(ii) 多样且复杂的混合失真难以区分。为解决上述挑战并助力短视频发展,我们建立了首个大规模万花筒式短视频质量评估数据库KVQ,包含600个用户上传的短视频及其经多种实际处理流程(包括预处理、转码和增强)生成的3600个处理视频。其中,每个视频的绝对质量分数以及部分难以区分样本间的排序分数均由专注于图像处理的专业研究员团队提供。基于该数据库,我们提出了首个短视频质量评估器KSVQE,该评估器能够借助大视觉语言模型(如CLIP)的内容理解能力识别质量决定性语义,并通过失真理解模块区分失真类型。实验结果表明,KSVQE在我们构建的KVQ数据库及主流VQA数据库上均取得了显著效果。