With the explosive increase of User Generated Content (UGC), UGC video quality assessment (VQA) becomes more and more important for improving users' Quality of Experience (QoE). However, most existing UGC VQA studies only focus on the visual distortions of videos, ignoring that the user's QoE also depends on the accompanying audio signals. In this paper, we conduct the first study to address the problem of UGC audio and video quality assessment (AVQA). Specifically, we construct the first UGC AVQA database named the SJTU-UAV database, which includes 520 in-the-wild UGC audio and video (A/V) sequences, and conduct a user study to obtain the mean opinion scores of the A/V sequences. The content of the SJTU-UAV database is then analyzed from both the audio and video aspects to show the database characteristics. We also design a family of AVQA models, which fuse the popular VQA methods and audio features via support vector regressor (SVR). We validate the effectiveness of the proposed models on the three databases. The experimental results show that with the help of audio signals, the VQA models can evaluate the perceptual quality more accurately. The database will be released to facilitate further research.
翻译:随着用户生成内容(UGC)的爆炸式增长,UGC视频质量评估(VQA)对于提升用户质量体验(QoE)愈发重要。然而,现有大多数UGC VQA研究仅关注视频的视觉失真,忽略了用户的QoE同时也受伴随音频信号的影响。本文首次开展了针对UGC音频与视频质量评估(AVQA)问题的研究。具体而言,我们构建了首个名为SJTU-UAV数据库的UGC AVQA数据库,包含520个野外场景下的UGC音频与视频(A/V)序列,并开展用户研究以获取各A/V序列的平均意见得分。随后,我们从音频和视频两个维度对SJTU-UAV数据库内容进行分析,以揭示该数据库的特性。此外,我们还设计了一系列AVQA模型,这些模型通过支持向量回归器(SVR)融合了主流VQA方法与音频特征。我们在三个数据库上验证了所提模型的有效性。实验结果表明,借助音频信号,VQA模型能够更准确地评估感知质量。该数据库将公开以促进后续研究。