The rise of video streaming applications has increased the demand for Video Quality Assessment (VQA). In 2016, Netflix introduced VMAF, a full reference VQA metric that strongly correlates with perceptual quality, but its computation is time-intensive. This paper proposes a Discrete Cosine Transform (DCT)-energy-based VQA with texture information fusion (VQ-TIF ) model for video streaming applications that predicts VMAF for the reconstructed video compared to the original video. VQ-TIF extracts Structural Similarity (SSIM) and spatio-temporal features of the frames from the original and reconstructed videos, fuses them using a Long Short-Term Memory (LSTM)-based model to estimate VMAF. Experimental results show that VQ-TIF estimates VMAF with a Pearson Correlation Coefficient (PCC) of 0.96 and a Mean Absolute Error (MAE) of 2.71, on average, compared to the ground truth VMAF scores. Additionally, VQ-TIF estimates VMAF at a rate of 9.14 times faster than the state-of-the-art VMAF implementation and a 89.44% reduction in the energy consumption, assuming an Ultra HD (2160p) display resolution.
翻译:视频流媒体的兴起增加了对视频质量评估(VQA)的需求。2016年,Netflix推出了与感知质量高度相关的全参考VQA指标VMAF,但其计算耗时较长。本文提出一种基于离散余弦变换(DCT)能量与纹理信息融合的视频质量评估模型(VQ-TIF),用于流媒体应用场景,能够预测重建视频相较于原始视频的VMAF分数。VQ-TIF从原始与重建视频中提取结构相似性(SSIM)与空时域特征,通过长短期记忆(LSTM)模型融合这些特征以估计VMAF。实验结果表明,与地面真实VMAF分数相比,VQ-TIF的估计结果平均皮尔逊相关系数(PCC)达0.96,平均绝对误差(MAE)为2.71。此外,假设超高清(2160p)显示分辨率,VQ-TIF的VMAF估计速度比现有最优VMAF实现快9.14倍,且能耗降低89.44%。