Video Multimethod Assessment Fusion (VMAF) [1], [2], [3] is a popular tool in the industry for measuring coded video quality. In this study, we propose an auditory-inspired frontend in existing VMAF for creating videos of reference and coded spectrograms, and extended VMAF for measuring coded audio quality. We name our system AudioVMAF. We demonstrate that image replication is capable of further enhancing prediction accuracy, especially when band-limited anchors are present. The proposed method significantly outperforms all existing visual quality features repurposed for audio, and even demonstrates a significant overall improvement of 7.8% and 2.0% of Pearson and Spearman rank correlation coefficient, respectively, over a dedicated audio quality metric (ViSQOL-v3 [4]) also inspired from the image domain.
翻译:视频多方法评估融合(VMAF)[1][2][3]是业界广泛采用的编码视频质量测量工具。在本研究中,我们在现有VMAF框架中引入了一种受听觉启发的前端处理模块,用于生成参考频谱图和编码频谱图,并将VMAF扩展至编码音频质量测量领域。我们将该系统命名为AudioVMAF。实验证明,图像复制技术能够进一步提升预测精度,尤其在处理带限锚点样本时效果显著。该方法显著优于所有面向音频改造的现有视觉质量特征,甚至与同样受图像领域启发的专用音频质量指标ViSQOL-v3[4]相比,皮尔逊相关系数和斯皮尔曼秩相关系数分别实现了7.8%和2.0%的整体提升。