As adaptive streaming becomes crucial for delivering high-quality video content across diverse network conditions, accurate metrics to assess perceptual quality are essential. This paper explores using the eXtended Peak Signal-to-Noise Ratio (XPSNR) metric as an alternative to the popular Video Multimethod Assessment Fusion (VMAF) metric for determining optimized bitrate-resolution pairs in the context of Versatile Video Coding (VVC). Our study is rooted in the observation that XPSNR shows a superior correlation with subjective quality scores for VVC-coded Ultra-High Definition (UHD) content compared to VMAF. We predict the average XPSNR of VVC-coded bitstreams using spatiotemporal complexity features of the video and the target encoding configuration and then determine the convex-hull online. On average, the proposed convex-hull using XPSNR (VEXUS) achieves an overall quality improvement of 5.84 dB PSNR and 0.62 dB XPSNR while maintaining the same bitrate, compared to the default UHD encoding using the VVenC encoder, accompanied by an encoding time reduction of 44.43% and a decoding time reduction of 65.46%. This shift towards XPSNR as a guiding metric shall enhance the effectiveness of adaptive streaming algorithms, ensuring an optimal balance between bitrate efficiency and perceptual fidelity with advanced video coding standards.
翻译:随着自适应流媒体在多样化网络条件下传输高质量视频内容变得日益关键,用于评估感知质量的精确度量指标至关重要。本文探讨在通用视频编码(VVC)背景下,使用扩展峰值信噪比(XPSNR)度量作为流行的视频多方法评估融合(VMAF)度量的替代方案,以确定优化的码率-分辨率对。我们的研究基于以下观察:对于VVC编码的超高清(UHD)内容,XPSNR相较于VMAF显示出与主观质量评分更优的相关性。我们利用视频的时空复杂度特征和目标编码配置预测VVC编码码流的平均XPSNR,随后在线确定凸包。平均而言,与使用VVenC编码器的默认UHD编码相比,所提出的基于XPSNR的凸包方法(VEXUS)在保持相同码率的同时,实现了整体质量5.84 dB PSNR和0.62 dB XPSNR的提升,同时编码时间减少了44.43%,解码时间减少了65.46%。这种以XPSNR作为指导指标的转变将增强自适应流媒体算法的有效性,确保在先进视频编码标准下实现码率效率与感知保真度之间的最优平衡。