We present GMLv2, a reference-based model designed for the prediction of subjective audio quality as measured by MUSHRA scores. GMLv2 introduces a Beta distribution-based loss to model the listener ratings and incorporates additional neural audio coding (NAC) subjective datasets to extend its generalization and applicability. Extensive evaluations on diverse testset demonstrate that proposed GMLv2 consistently outperforms widely used metrics, such as PEAQ and ViSQOL, both in terms of correlation with subjective scores and in reliably predicting these scores across diverse content types and codec configurations. Consequently, GMLv2 offers a scalable and automated framework for perceptual audio quality evaluation, poised to accelerate research and development in modern audio coding technologies.
翻译:本文提出GMLv2,一种基于参考的模型,旨在预测以MUSHRA分数衡量的主观音频质量。GMLv2引入基于Beta分布的损失函数来建模听者评分,并整合额外的神经音频编码主观数据集以扩展其泛化能力和适用性。在多样化测试集上的广泛评估表明,所提出的GMLv2在主观分数相关性方面,以及在不同内容类型和编解码配置中可靠预测这些分数的能力方面,均持续优于广泛使用的指标(如PEAQ和ViSQOL)。因此,GMLv2为感知音频质量评估提供了一个可扩展的自动化框架,有望加速现代音频编码技术的研究与开发。