The rise in video streaming applications has increased the demand for video quality assessment (VQA). In 2016, Netflix introduced Video Multi-Method Assessment Fusion (VMAF), a full reference VQA metric that strongly correlates with perceptual quality, but its computation is time-intensive. We propose a Discrete Cosine Transform (DCT)-energy-based VQA with texture information fusion (VQ-TIF) model for video streaming applications that determines the visual quality of the reconstructed video compared to the original video. VQ-TIF extracts Structural Similarity (SSIM) and spatiotemporal features of the frames from the original and reconstructed videos and fuses them using a long short-term memory (LSTM)-based model to estimate the visual quality. Experimental results show that VQ-TIF estimates the visual quality with a Pearson Correlation Coefficient (PCC) of 0.96 and a Mean Absolute Error (MAE) of 2.71, on average, compared to the ground truth VMAF scores. Additionally, VQ-TIF estimates the visual quality at a rate of 9.14 times faster than the state-of-the-art VMAF implementation, along with an 89.44 % reduction in energy consumption, assuming an Ultra HD (2160p) display resolution.
翻译:流媒体应用的兴起增加了对视频质量评估(VQA)的需求。2016年,Netflix提出了视频多方法评估融合(VMAF)——一种与感知质量高度相关的全参考VQA指标,但其计算耗时较长。我们提出了一种基于离散余弦变换(DCT)能量及纹理信息融合的视频质量评估模型(VQ-TIF),用于流媒体应用中,通过比较重构视频与原始视频确定其视觉质量。VQ-TIF从原始视频和重构视频中提取结构相似性(SSIM)和帧的时空特征,并利用基于长短期记忆(LSTM)的模型进行特征融合以估计视觉质量。实验结果表明,与真实VMAF分数相比,VQ-TIF估计的视觉质量平均皮尔逊相关系数(PCC)达0.96,平均绝对误差(MAE)为2.71。此外,在超高清(2160p)显示分辨率下,VQ-TIF的视觉质量估计速度比现有最优VMAF实现快9.14倍,同时能耗降低89.44%。